Probably this is a novice question.
I have single end reads of size 100bp from Illumina TruSeq sequencing. I am using mouse genome build mm9 from TopHat Index and annotation downloads.
The library is 16 Million reads
My tophat command is as follows:
tophat -p 4 -N 3 --read-gap-length 3 --read-edit-dist 3 --output-dir <path> <genome_path> path/to/input.fasta
python -m HTSeq.scripts.count -m intersection-nonempty -s no -i gene_id -t exon accepted_hits.sam /mm9/genes.gtf > counts.txt
After HTSeq Count
no_feature 4973501 ambiguous 125622 too_low_aQual 0 not_aligned 0 alignment_not_unique 5620063
The HTSeq output has the above statistics... Is it normal to have such kind of numbers for
alignment_not_unique for single end sequencing. Is there something that can be done to improve this statistics.
Well as an extension to the above question, what happens if the features that are not unique if counted for both genes in Differential expression analysis? Does this cause any bias?
Thanks in advance!