I have RNASeq data from mouse and am trying to (1) align it using STAR, and (2) estimate gene counts using HTSeq. I am using the .fa and .gtf files from Ensembl that correspond to mm9 (Mus_musculus.NCBIM37.67.dna.toplevel.fa, Mus_musculus.NCBIM37.67.gtf).
My question is what is acceptable percentage of reads that show "no features" and "not unique" as the result of htseq-count function?
Here is the parameters I used for htseq:
samtools view -h $sorted.bam | python2.7 htseq-count -m intersection-nonempty -f sam -r name -s no -a 0 -i gene_id -o samout.out - Mus_musculus.NCBIM37.67.gtf > result.count
For example, one of my RNASeq samples has 36,269,180 uniquely mapped genes (as assessed by STAR), but following HTseq-count 6,065,105 (16%) of the reads show "__no_feature" and 19,474,101 (54%) show "__alignment_not_unique". This means only 30% of original reads are mapped to the genes; Isn't this too low?