Tool:Salmon: A fast and versatile new tool for RNA-seq quantification
1
17
Entering edit mode
9.3 years ago
Rob 6.5k

(cross-posted on seqanswers a few days ago -- http://seqanswers.com/forums/showthread.php?t=49062, please let me know if posting here now is a violation of etiquette!)

Hi all,

I'd like to let you know about Salmon, a new tool we've been developing for isoform-level quantification from RNA-seq data. It is, conceptually, the successor to our Sailfish software, but boasts a significant number of improvements and relies on a very different methodology. It maintains or improves upon the main strengths of Sailfish (i.e. it is as-fast to faster in most situations, and requires substantially less memory --- especially for large transcriptomes) while providing a number of additional benefits. For example, it eliminates the need to build a parameter-dependent (k-mer size) index, and makes much better use of paired-end data and longer reads --- in the testing we and others have done so far, it appears to be very accurate even in complex situations. It also provides alignment-based and alignment-free quantification modes to suit users with both needs.

Salmon is fully open source, and is currently being developed on the develop branch of the Sailfish GitHub repository. The documentation is available via ReadTheDocs, the latest binaries are available here, and we welcome questions and discussion on the Sailfish Google Group. The manuscript is in preparation, but we already have a number of people testing and using the software, and we'd like to get input and feedback from the community as we finish the manuscript. So please, give salmon a try --- it's tasty ;).

--Rob

RNA-Seq • 8.4k views
ADD COMMENT
5
Entering edit mode
8.6 years ago
nico ▴ 50

Hi Rob,

Salmon is a really good tool. Fast, stable and accurate.

What kind of normalization is advised (if any) to be able to compare the expression of one gene across samples?

Thanks for your advice

Nico

ADD COMMENT
0
Entering edit mode
Good question. My favorite differential expression tools expect 'readcount' as input. We know that can't really be counted precisely at the transcript level, so how does salmon handle it?
ADD REPLY
0
Entering edit mode

Hi Karl,

Salmon (and Sailfish) estimate read counts at the transcript level by "soft-assigning" assigning multi-mapping reads to different possible transcripts of origin based on a probabilistic model. You can find a detailed description of how "estimated" read counts are derived in the Salmon pre-print.

The key here is that multi-mapping reads are never "double-counted" - the total sum of estimated reads will equal the total number of aligned reads, and the estimated read count is an estimate of the actual number of reads originating from each transcript (and accounting for multimapping). There are also other, newer tools, that are designed to deal with such read counts directly (which can be used with Kallisto, Salmon & Sailfish).

ADD REPLY
0
Entering edit mode

Thanks, Nico! As Colin Dewey mentions in this thread on the RSEM user group, TPM is a relative measure of abundance, and can be used to assess relative abundances across samples. However, as he suggests, this is rarely what one has in mind when he wishes to do across-sample comparison. For that, you'll need an extra-level of normalization --- people have considered many approaches. This recent bioRxiv paper and the Dilles et al. paper referenced therein, find that the best-performing normalization methods seem to be TMM and the DeSeq normalization.

ADD REPLY
0
Entering edit mode

So to summarize, you'd say that applying a scale normalization such as TMM, directly on the TPM values produced by Salmon, is the way to go prior to differential gene expression analysis?

ADD REPLY

Login before adding your answer.

Traffic: 3003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6