How reproducible is transcript quantification through salmon?
1
0
Entering edit mode
16 months ago
Prangan ▴ 20

Hello!

I am conducting differential expression analysis on a subset of plant transcripts. I have decided to go with Salmon+tximport+DESeq2. I am using the pseudoalignment mode of salmon on fastp-trimmed fastq files. Salmon index is run with '-k=31' and quant is run with '--validateMappings' flags. I have run quant for the same samples twice and have observed significant variations in the TPM and NumReads. Between both the runs, about 82% of the transcripts were found to be common DETs by DESeq2.

I want to know if this margin of reproducibility is common or is there a way to increase the reproducibility of salmon quant?

Any and all suggestions are welcome!

Thank you

DESeq2 quantification salmon differential-expression • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you say with confidence that all parameters, that means code, software versions, reference genome/transcriptome annotations were precisely the same? What is "significant variation"? Can you show some correlation plots to support this?

ADD REPLY
3
Entering edit mode
16 months ago

Quantifying transcripts (as opposed to genes) with a high degree of certainty is known to be a difficult, perhaps impossible problem, especially where transcripts share a large fraction of their sequence in common. Thus all transcript quantification approaches have a certain amount of randomness and uncertainty.

This uncertainty can be quantified by using inferential replicates, or bootstraps, which can be activated in salmon using the --numBootstraps switch. Unfortunately, while tximport can ingest this data, at the moment DESeq2 can't account for these when calculating a p-value for differential expression. The swish tool (part of fishpond) can, but last time I looked it was somewhat difficult to use swish with non-model organisms.

ADD COMMENT
2
Entering edit mode

Happy to take any swish questions on Bioconductor by the way.

Another relevant Bioconductor tool, recently preprinted is here:

Dividing out quantification uncertainty allows efficient assessment of differential transcript expression https://www.biorxiv.org/content/10.1101/2023.04.02.535231v1

Swish doesn't have much sensitivity when the per-group replicate count is small (n=3), but this leverages edgeR so should gain sensitivity through the standard empirical Bayes procedure.

ADD REPLY

Login before adding your answer.

Traffic: 1436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6