Hello everyone,
I want to explore differential expression of a relatively small set of genes (~30). I used this set as reference to map reads.
I used Bowtie2 to build an index of these genes sequences, then I mapped transcriptomes reads to this reference, and I got the sam files. In the sam file, for each mapped reads, there is the gene name to which the read mapped as RNAME (reference sequence name).
Now, I want to obtain the matrix to see if there are differentially expressed genes. Looking to HTSeq, I noticed he wants gff or gtf file to build this matrix, but since I don't have the whole genome as reference, I cannot use it.
Do you have some suggestion on how to build the matrix? Should I use samtools and build a bed file to get the coverage for each gene?
You can make up a simple annotation format (SAF) file with the gene names and their lengths and use it with
featureCounts
. See help page here.