Difference between HTSeq-count & Salmon quant results
1
2
Entering edit mode
4.6 years ago
pablo ▴ 230

Hi guys,

I've already posted a question about Salmon earlier , but this one is totally different

I used two different mappers like Hisat2 and Salmon to map my reads . I got good overall read mapping reads with both of them.

I used HTSeq-count to quantify my mapped reads from Hisat2. But the quant.sf file of Salmon gave me different results : on the one hand, counted reads are quite similar lbetween Htseq-count results and quant.sf file , but on the other hand, there could be a factor 40 between reads counted from Salmon and HTSeq-count for the same exon. Moreover, HTSeq-count will count 0 read for an exon while Salmon will count 15 reads for the same one. I really don't understand these results. I found some papers talking about that but nothing that gave me a good answer..

In addition, I'm gonna show you my results (just the 10 first lines) :

HTSEq-count

AT1G01010:exon:1 1 AT1G01010:exon:2 2 AT1G01010:exon:3 0 AT1G01010:exon:4 2 AT1G01010:exon:5 2 AT1G01010:exon:6 3 AT1G01020:exon:1 0 AT1G01020:exon:10 0 AT1G01020:exon:11 0 AT1G01020:exon:12 1

Salmon quant.sf (I've deleted the 3 middle lines of the file to get a better visualization)

AT1G01010.1 34 AT1G01020.2 43.6719 AT1G01020.6 13.3601 AT1G01020.1 54.2279 AT1G01020.4 0 AT1G01020.5 12.2951 AT1G01020.3 21.4449 AT1G01030.2 8.79053e-05 AT1G01030.1 19.9999

Is it due to the expectation-maximization algorithm of Salmon that some transcripts from HTSeq-count (for example AT1G01010:exon:4 is not found into the Salmon's file?

Best, Vincent

RNA-Seq salmon htseq-count • 4.9k views
1
Entering edit mode

Salmon count multi-mapping reads, HTSeq discards multi-mapping reads - most likely this is the cause of the discrepancy.

If you are mapping to the transcriptome, HTSeq counts are not appropriate, as it will discard too many reads which map to different isoforms of the same gene. HTSeq should be used to count reads mapped to the genome - it will still discard multi-mapped reads, but you won't have multi-mappers due to isoforms.

0
Entering edit mode

I used a genome to map my reads with Hisat2 .

0
Entering edit mode

By default featureCounts discards multi-mapping reads, it has three parameters to alter this behaviour: -M, -O and --fraction.

2
Entering edit mode
4.6 years ago
h.mon 34k

You are performing two different types of counts here: with HTSeq, you are counting reads mapping over exons (more common is to count reads mapped over genes), and with Salmon, you are counting reads over transcripts, that is, isoforms. So the:

AT1G01020.2 43.6719
AT1G01020.6 13.3601


means isoform 2 (not exon 2) of AT1G01020 has 43.7 read counts and isoform 6 has 13.4 read counts.

The counts between the two programs are not directly comparable, then.

0
Entering edit mode

I used a genome reference to map my reads with HISAT2. So, HTSeq count them over genes? (I used a GFF file as annotation)