Question: Difference between HTSeq-count & Salmon quant results
gravatar for pablo
2.5 years ago by
pablo150 wrote:

Hi guys,

I've already posted a question about Salmon earlier , but this one is totally different

I used two different mappers like Hisat2 and Salmon to map my reads . I got good overall read mapping reads with both of them.

I used HTSeq-count to quantify my mapped reads from Hisat2. But the quant.sf file of Salmon gave me different results : on the one hand, counted reads are quite similar lbetween Htseq-count results and quant.sf file , but on the other hand, there could be a factor 40 between reads counted from Salmon and HTSeq-count for the same exon. Moreover, HTSeq-count will count 0 read for an exon while Salmon will count 15 reads for the same one. I really don't understand these results. I found some papers talking about that but nothing that gave me a good answer..

In addition, I'm gonna show you my results (just the 10 first lines) :


AT1G01010:exon:1 1 AT1G01010:exon:2 2 AT1G01010:exon:3 0 AT1G01010:exon:4 2 AT1G01010:exon:5 2 AT1G01010:exon:6 3 AT1G01020:exon:1 0 AT1G01020:exon:10 0 AT1G01020:exon:11 0 AT1G01020:exon:12 1

Salmon quant.sf (I've deleted the 3 middle lines of the file to get a better visualization)

AT1G01010.1 34 AT1G01020.2 43.6719 AT1G01020.6 13.3601 AT1G01020.1 54.2279 AT1G01020.4 0 AT1G01020.5 12.2951 AT1G01020.3 21.4449 AT1G01030.2 8.79053e-05 AT1G01030.1 19.9999

Is it due to the expectation-maximization algorithm of Salmon that some transcripts from HTSeq-count (for example AT1G01010:exon:4 is not found into the Salmon's file?

Best, Vincent

rna-seq salmon htseq-count • 2.5k views
ADD COMMENTlink modified 2.5 years ago by GenoMax94k • written 2.5 years ago by pablo150

Salmon count multi-mapping reads, HTSeq discards multi-mapping reads - most likely this is the cause of the discrepancy.

If you are mapping to the transcriptome, HTSeq counts are not appropriate, as it will discard too many reads which map to different isoforms of the same gene. HTSeq should be used to count reads mapped to the genome - it will still discard multi-mapped reads, but you won't have multi-mappers due to isoforms.

ADD REPLYlink written 2.5 years ago by h.mon32k

I used a genome to map my reads with Hisat2 .

Does FeatureCounts discard multi mapping reads?

ADD REPLYlink written 2.5 years ago by pablo150

By default featureCounts discards multi-mapping reads, it has three parameters to alter this behaviour: -M, -O and --fraction.

ADD REPLYlink written 2.5 years ago by h.mon32k
gravatar for h.mon
2.5 years ago by
h.mon32k wrote:

You are performing two different types of counts here: with HTSeq, you are counting reads mapping over exons (more common is to count reads mapped over genes), and with Salmon, you are counting reads over transcripts, that is, isoforms. So the:

AT1G01020.2 43.6719 
AT1G01020.6 13.3601

means isoform 2 (not exon 2) of AT1G01020 has 43.7 read counts and isoform 6 has 13.4 read counts.

The counts between the two programs are not directly comparable, then.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by h.mon32k

I used a genome reference to map my reads with HISAT2. So, HTSeq count them over genes? (I used a GFF file as annotation)

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by pablo150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 925 users visited in the last hour