I have RNASeq data from mouse and am trying to (1) align it using STAR, and (2) estimate gene counts using HTSeq. I am using the .fa and .gtf files from Ensembl that correspond to mm9 (
My question is what is acceptable percentage of reads that show "no features" and "not unique" as the result of htseq-count function?
Here is the parameters I used for htseq:
samtools view \ -h $sorted.bam | \ python2.7 htseq-count \ -m intersection-nonempty \ -f sam \ -r name \ -s no \ -a 0 \ -i gene_id \ -o samout.out \ - \ Mus_musculus.NCBIM37.67.gtf > result.count
For example, one of my RNASeq samples has 36,269,180 uniquely mapped genes (as assessed by STAR), but following HTseq-count 6,065,105 (16%) of the reads show
__no_feature and 19,474,101 (54%) show
__alignment_not_unique. This means only 30% of original reads are mapped to the genes; Isn't this too low?