Question

How reproducible is transcript quantification through salmon?

0

Entering edit mode

10 months ago

Prangan ▴ 20

Hello!

I am conducting differential expression analysis on a subset of plant transcripts. I have decided to go with Salmon+tximport+DESeq2. I am using the pseudoalignment mode of salmon on fastp-trimmed fastq files. Salmon index is run with '-k=31' and quant is run with '--validateMappings' flags. I have run quant for the same samples twice and have observed significant variations in the TPM and NumReads. Between both the runs, about 82% of the transcripts were found to be common DETs by DESeq2.

I want to know if this margin of reproducibility is common or is there a way to increase the reproducibility of salmon quant?

Any and all suggestions are welcome!

Thank you

DESeq2 quantification salmon differential-expression • 969 views

ADD COMMENT • link updated 9 months ago by ATpoint 82k • written 10 months ago by Prangan ▴ 20

0

Entering edit mode

Can you say with confidence that all parameters, that means code, software versions, reference genome/transcriptome annotations were precisely the same? What is "significant variation"? Can you show some correlation plots to support this?

ADD REPLY • link 10 months ago by ATpoint 82k

ATpoint · Accepted Answer · 2023-07-04

3

Entering edit mode

9 months ago

i.sudbery 19k

Quantifying transcripts (as opposed to genes) with a high degree of certainty is known to be a difficult, perhaps impossible problem, especially where transcripts share a large fraction of their sequence in common. Thus all transcript quantification approaches have a certain amount of randomness and uncertainty.

This uncertainty can be quantified by using inferential replicates, or bootstraps, which can be activated in salmon using the --numBootstraps switch. Unfortunately, while tximport can ingest this data, at the moment DESeq2 can't account for these when calculating a p-value for differential expression. The swish tool (part of fishpond) can, but last time I looked it was somewhat difficult to use swish with non-model organisms.

ADD COMMENT • link 9 months ago by i.sudbery 19k

2

Entering edit mode

Happy to take any swish questions on Bioconductor by the way.

Another relevant Bioconductor tool, recently preprinted is here:

Dividing out quantification uncertainty allows efficient assessment of differential transcript expression https://www.biorxiv.org/content/10.1101/2023.04.02.535231v1

Swish doesn't have much sensitivity when the per-group replicate count is small (n=3), but this leverages edgeR so should gain sensitivity through the standard empirical Bayes procedure.

ADD REPLY • link updated 9 months ago by ATpoint 82k • written 9 months ago by Michael Love ★ 2.6k