This has puzzled me for days and I couldn't find any explanation on the internet.
I noticed HTSeq-count gives the reads counts of a gene as 0. But when viewed in IGV with the
accepted_hits.bam from tophat2 alignment, there are hundreds to thousands reads aligned in that gene region (from different samples).
This is how I used HTSeq-count from
samtools sort -T tmp -o accepted_hits_nsort.bam -n accepted_hits.bam htseq-count --format=bam --strand=no --order=name -a 5 --mode=union accepted_hits_nsort.bam Homo_sapiens.GRCh37.72.gtf > htseq_count_0.6.txt
However, when I break down the accepted_hits.bam and extract the chromosome where the gene is, HTSeq-count gives the right reads counts for this BAM file.
accepted_hits.bam contains 60,845,216 alignment using
samtools view -c accepted_hits.bam. I don't think for PE RNA-seq, file at size of 4.5G would be a problem for samtools or HTSeq-count to deal with?
My suspicions are:
- samtools name sort problem
- htseq-count bug
- out of memory
But from the try-error I have done so far, I cannot really figure out what has gone wrong here.
Could any one give any hint here? Any suggestion would be appreciated. Thanks.