I am learning how to do differential expression. I was given transcriptome data and I am fortunate that the sample already has an established annotated genome. So far, I aligned the transcripts to the reference genome via STAR and I've had good mapping stats (>95%). Now, I'm quantifying the counts with HTSeq-Count. I expected it to be a relatively smooth process, but the results have been bugging me.
Here is the line I used:
htseq-count -f bam -r pos -t exon $RNA/AN0_1.Aligned.sortedByCoord.out.bam $GEN > AN0_1.no.exon.counts
__no_feature 14805713 __ambiguous 4766268 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 113292
I feel like that's a high value for no_features right?
I've tried playing around with the strandedness (-s), but all options give similar results. I also went back and made sure the GTF file has the gene_id attribute. I also tried using a GFF file instead.
Might it also have to do with this warning sign from the err file?
Warning: Mate pairing was ambiguous for 23164 records; mate key for first such record: ('A00564:87:HJKFCDSXX:2:2205:13838:9330', 'first', 'CP059858.1', 91661, 'CP059858.1', 91831, 317).
Any help will be much appreciated!