Question: Difference between HTSeq-count & Salmon quant results
1
gravatar for vincentpailler
15 months ago by
vincentpailler100 wrote:

Hi guys,

I've already posted a question about Salmon earlier , but this one is totally different

I used two different mappers like Hisat2 and Salmon to map my reads . I got good overall read mapping reads with both of them.

I used HTSeq-count to quantify my mapped reads from Hisat2. But the quant.sf file of Salmon gave me different results : on the one hand, counted reads are quite similar lbetween Htseq-count results and quant.sf file , but on the other hand, there could be a factor 40 between reads counted from Salmon and HTSeq-count for the same exon. Moreover, HTSeq-count will count 0 read for an exon while Salmon will count 15 reads for the same one. I really don't understand these results. I found some papers talking about that but nothing that gave me a good answer..

In addition, I'm gonna show you my results (just the 10 first lines) :

HTSEq-count

AT1G01010:exon:1 1 AT1G01010:exon:2 2 AT1G01010:exon:3 0 AT1G01010:exon:4 2 AT1G01010:exon:5 2 AT1G01010:exon:6 3 AT1G01020:exon:1 0 AT1G01020:exon:10 0 AT1G01020:exon:11 0 AT1G01020:exon:12 1

Salmon quant.sf (I've deleted the 3 middle lines of the file to get a better visualization)

AT1G01010.1 34 AT1G01020.2 43.6719 AT1G01020.6 13.3601 AT1G01020.1 54.2279 AT1G01020.4 0 AT1G01020.5 12.2951 AT1G01020.3 21.4449 AT1G01030.2 8.79053e-05 AT1G01030.1 19.9999

Is it due to the expectation-maximization algorithm of Salmon that some transcripts from HTSeq-count (for example AT1G01010:exon:4 is not found into the Salmon's file?

Best, Vincent

rna-seq salmon htseq-count • 1.2k views
ADD COMMENTlink modified 15 months ago by genomax74k • written 15 months ago by vincentpailler100
1

Salmon count multi-mapping reads, HTSeq discards multi-mapping reads - most likely this is the cause of the discrepancy.

If you are mapping to the transcriptome, HTSeq counts are not appropriate, as it will discard too many reads which map to different isoforms of the same gene. HTSeq should be used to count reads mapped to the genome - it will still discard multi-mapped reads, but you won't have multi-mappers due to isoforms.

ADD REPLYlink written 15 months ago by h.mon28k

I used a genome to map my reads with Hisat2 .

Does FeatureCounts discard multi mapping reads?

ADD REPLYlink written 15 months ago by vincentpailler100

By default featureCounts discards multi-mapping reads, it has three parameters to alter this behaviour: -M, -O and --fraction.

ADD REPLYlink written 15 months ago by h.mon28k
2
gravatar for h.mon
15 months ago by
h.mon28k
Brazil
h.mon28k wrote:

You are performing two different types of counts here: with HTSeq, you are counting reads mapping over exons (more common is to count reads mapped over genes), and with Salmon, you are counting reads over transcripts, that is, isoforms. So the:

AT1G01020.2 43.6719 
AT1G01020.6 13.3601

means isoform 2 (not exon 2) of AT1G01020 has 43.7 read counts and isoform 6 has 13.4 read counts.

The counts between the two programs are not directly comparable, then.

ADD COMMENTlink modified 15 months ago • written 15 months ago by h.mon28k

I used a genome reference to map my reads with HISAT2. So, HTSeq count them over genes? (I used a GFF file as annotation)

ADD REPLYlink modified 15 months ago • written 15 months ago by vincentpailler100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour