I was working on bam files generated from Tophat in order to convert into the raw count file.
Firstly, I just converted directly bam file into the raw count file using the following command.
- htseq-count -f bam KDR_pre_thout/accepted_hits.bam genes.gtf > KDR_pre_raw_count.txt
which directly conver the bam file into the raw count file.
But I found the following paper. (http://www.nature.com/nprot/journal/v8/n9/pdf/nprot.2013.099.pdf), which suggest that converting the bam file into the sam file and then, convert sam file into the raw count file.
I thought it would be same.. but NOT.
- samtools sort -n KDR_pre_thout/accepted_hits.bam KDR_pre_sn
- samtools view -o KDR_pre_sn.sam KDR_pre_sn.bam
- htseq-count -s no -a 10 KDR_pre_sn.sam gene.gtf > KDR_pre_sn.count
I have used these two ways in order to convert the bam file into the count file.
And I found that these results are different. Which I have to follow?? Second as paper mentioned(maybe sort is the important difference) Then, why??? Anybody knows why sort is necessary??
Please help me with this.!