Question

Comparison Htseq And Feature Count

2

Entering edit mode

10.1 years ago

HNK ▴ 150

Hey I have result of approx 50 samples from HTSeq and feature count. I want to compare the results of both tools that how close they are. I mean i need to do correaltion between them Or is there a way of getting an output (graph etc)that shows there relation.

htseq • 20k views

ADD COMMENT • link updated 4.2 years ago by Biostar 20 • written 10.1 years ago by HNK ▴ 150

0

Entering edit mode

To see how close or different the results are of both the tools

ADD REPLY • link 10.1 years ago by HNK ▴ 150

score 3 · Answer 1 · 2014-03-26

The featureCounts paper actually goes into some detail about when and why it will disagree with htseq-count (see section 5.2), so I'm not sure what further you're trying to achieve. If you really want, you might just group the counts into 3 categories: identical, htseq-count greater, featureCounts greater. You'll find the "identical" category contains most of the genes. For the differences, I think featureCounts generally performs in the correct way (or at least I've generally found Wei Shi's argument for why featureCounts works differently to be good).

Edit: The alternative method is exactly what they did in the featureCounts paper, which is to just make a Venn Diagram of the read assignment.

Ram · Answer 2 · 2014-03-27

I think it is interesting to compare the two programs. FeatureCounts runs much faster and supports more input formats (including BAM files that are sorted by either read names or genomic coordinates). But it is also important to compare the read assignment results after all.

For single-end reads, the default setting of featureCounts should work exactly the same way as HTSeq-count does on the union mode, except that the annotation files (GTF or GFF) are parsed differently. HTSeq-count excludes the "end" location from the feature interval, but featureCounts includes the "end" location in the interval. I believe that featureCounts parses the annotation file in the correct way according to the GTF/GFF format specification:

http://genome.ucsc.edu/FAQ/FAQformat.html#format3

, where the "end" location is said to be inclusive.

In the paired-end mode, featureCounts does more than HTSeq-count by breaking the tie of ambiguous using votes. Each read in a fragment (read-pair) is a vote. If there is a feature that receives uniquely highest number of votes (either 1 or 2), this fragment is assigned to this feature without ambiguity.