STAR --quantMode GeneCounts vs HTSeq-count
1
0
Entering edit mode
6 months ago

Hi all,

I have the genecount file generated while using STAR for alignment. The second column of the genecount file (read count of unstranded hits) is supposed to be the exact same as the one counted using HTSeq-count in union mode. However, I noticed it happens to match with the genecount file generated while using STAR only when I give the unsorted bam file as input to the HTSeq. While the count is differs when I give the sorted bam file as input. Why is it so? Which tool count should I rely on for downstream analysis?

star HTSeq next-gen sequencing RNA-Seq • 526 views
ADD COMMENT
3
Entering edit mode
6 months ago
h.mon 33k

The bellow is true for old versions of HTSeq-count, I believe it is still true for recent versions, but you should check the documentation.

HTseq-count, by default, expects name-sorted bam files. If you use as input position-sorted bam files, HTSeq-count will not detect read pairs properly, and will count twice a large proportion of pairs. There is a hTSeq-count flag to indicate the bam is position-sorted, but, with old versions of HTSeq (circa 0.6, if memory serves me correctly), it was very common HTSeq would crash due to excessive memory use.

ADD COMMENT
0
Entering edit mode

Thanks. The count file of both the tools matched when I sorted it wrt name instead of the default samtools sort before passing it on to htseq-count.

So basically, you reckon I go ahead with the genecount file generated by STAR for the downstream analysis or should I count again for the rest of my samples using htseq-count before proceeding? (In spite of getting the same count from both the tools for one of my sample.)

ADD REPLY
1
Entering edit mode

Just go ahead with STAR counts.

ADD REPLY

Login before adding your answer.

Traffic: 1443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6