I am working on getting the read counts for some genes. Firstly, I aligned my trimmed RNASeq reads with human and mouse genome to get alignment file (accepted_hits.bam) in bam format using tophat. I provided htseq-count tool with two different bam files (sorted and unsorted).
Case1: I used the accepted_hits.bam from tophat directly as the input to the htseq-count and counted the reads for each gene.
$ htseq-count -f bam -s yes -i gene_id accepted_hits.bam hg19_genes.gtf > HTSeq-count_$sampleid$divider$currentDate.txt
Note: By default accepted_hits.bam file is sorted based on coordinates by tophat.
Case2: I sorted the accepted_hits.bam file based on "NAMES" using samtools. Then provided as input to the htseq-count and counted the reads for each gene.
$ samtools sort -n accepted_hits.bam accepted_hits_sorted.bam
$ htseq-count -f bam -r name -s yes -i gene_name accepted_hits_sorted.bam hg19_genes.gtf > HTSeq-count_$sampleid$divider$currentDate.txt
|Globin Genes||Stranded: YES||Stranded: YES|
|% of Globin_reads||0.1626||0.1102|
Can anyone explain why the total reads value is low for sorted bam compared to unsorted bam?