Question

STAR --quantMode GeneCounts vs HTSeq-count

0

Entering edit mode

3.3 years ago

bioinfo456 ▴ 150

Hi all,

I have the genecount file generated while using STAR for alignment. The second column of the genecount file (read count of unstranded hits) is supposed to be the exact same as the one counted using HTSeq-count in union mode. However, I noticed it happens to match with the genecount file generated while using STAR only when I give the unsorted bam file as input to the HTSeq. While the count is differs when I give the sorted bam file as input. Why is it so? Which tool count should I rely on for downstream analysis?

star HTSeq next-gen sequencing RNA-Seq • 5.2k views

ADD COMMENT • link updated 3.3 years ago by h.mon 35k • written 3.3 years ago by bioinfo456 ▴ 150

score 3 · Accepted Answer · 2021-01-15

3

Entering edit mode

3.3 years ago

h.mon 35k

The bellow is true for old versions of HTSeq-count, I believe it is still true for recent versions, but you should check the documentation.

HTseq-count, by default, expects name-sorted bam files. If you use as input position-sorted bam files, HTSeq-count will not detect read pairs properly, and will count twice a large proportion of pairs. There is a hTSeq-count flag to indicate the bam is position-sorted, but, with old versions of HTSeq (circa 0.6, if memory serves me correctly), it was very common HTSeq would crash due to excessive memory use.

ADD COMMENT • link 3.3 years ago by h.mon 35k

0

Entering edit mode

Thanks. The count file of both the tools matched when I sorted it wrt name instead of the default samtools sort before passing it on to htseq-count.

So basically, you reckon I go ahead with the genecount file generated by STAR for the downstream analysis or should I count again for the rest of my samples using htseq-count before proceeding? (In spite of getting the same count from both the tools for one of my sample.)