Question

Difference in the counts generated using FeatureCounts and HTseq

0

Entering edit mode

3.5 years ago

bioinformatics.queries ▴ 70

Hi Everyone

I have question related to the counts generated by software using FeatureCounts and Htseq. They both give very different results. For example

FeatureCounts

AluSx3       295781
MER5A        244353
MER41B        43925
MIR         1513933

HTSeq

AluSx3        88023
MER5A        111860
MER41B        18632
MIR           67211

What could be the reason for large difference and which tool we should take into account for our analysis. Currently I was trying to generate the counts for transposons.

Thanks

sequencing • 1.6k views

ADD COMMENT • link updated 3.5 years ago by h.mon 35k • written 3.5 years ago by bioinformatics.queries ▴ 70

0

Entering edit mode

Why you've used the "Tool" tag for a question? You must include the syntax used for htseq-count and featurecounts.

ADD REPLY • link 3.5 years ago by Shred ★ 1.4k

0

Entering edit mode

Thank you so much for your response. I used the following command. I used the default parameter for both the tools.

FeatureCounts

featureCounts -T 10 -a $GFF/hg19_rmsk_TE.gtf -o TE_featurecounts/${file}_featureCounts.txt $file.sorted.bam

Htseq count

htseq-count -f bam $file.sorted.bam $GFF/hg19_rmsk_TE.gtf > TE_counts/$file.count.out

Could you please suggest what to be don? Am I required to change any parameter

ADD REPLY • link 3.5 years ago by bioinformatics.queries ▴ 70

score 0 · Answer 1 · 2020-10-15

0

Entering edit mode

3.5 years ago

dariober 14k

Can you post the commands you used for both featureCounts and htseq and the summary statistics they produce?

For one thing, the two programs have different defaults for filtering reads on mapping quality.

featureCounts v1.6.4:

  -Q <int>            The minimum mapping quality score a read must satisfy in
                      order to be counted. For paired-end reads, at least one
                      end should satisfy this criteria. 0 by default.

htseq

-a <minaqual>, --a=<minaqual>
Skip all reads with MAPQ alignment quality lower than the given minimum value (default: 10). MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads.

Once I compared to two and found negligible differences but I can't remember exactly how I made the comparison.

ADD COMMENT • link 3.5 years ago by dariober 14k

0

Entering edit mode

Thank you so much for your response. I used the following command. I used the default parameter for both the tools.

FeatureCounts

featureCounts -T 10 -a $GFF/hg19_rmsk_TE.gtf -o TE_featurecounts/${file}_featureCounts.txt $file.sorted.bam

Htseq count

htseq-count -f bam $file.sorted.bam $GFF/hg19_rmsk_TE.gtf > TE_counts/$file.count.out

Could you please suggest what to be don? Am I required to change any parameter

Htseq

ADD REPLY • link 3.5 years ago by bioinformatics.queries ▴ 70

0

Entering edit mode

As I say in my answer, the defaults for mapping quality are different and featureCounts does not filter for mapping quality by default. Since you mention transposones, it may well be that you have lots of reads with MAPQ 0 hence more counts with featureCounts.

ADD REPLY • link 3.5 years ago by dariober 14k

0

Entering edit mode

So what filter do you suggest to apply for transposon. What should we set the parameter for -Q in featureCounts ?

ADD REPLY • link 3.5 years ago by bioinformatics.queries ▴ 70