Question

Which bias flags to run with Salmon before DESeq2 analysis?

2

Entering edit mode

4.8 years ago

cameron.holman ▴ 20

Hi All,

Thanks for the help. I am curious which biases you normally run while using Salmon before DESeq2 analysis. Michael Love states in the DESeq2 vingnettes that he recommends running Salmon with the --gcBias flag:

"We recommend using the --gcBias flag which estimates a correction factor for systematic biases commonly present in RNA-seq data (Love, Hogenesch, and Irizarry 2016; Patro et al. 2017), unless you are certain that your data do not contain such bias."

Salmon can also be run with:

--seqBias

--posBias

Do you run with all bias flags? Do you find significant differences in output? Do you trust the data more with these biases accounted for? I've posted the descriptions from the Salmon documentation below in case that is helpful.

Thanks very much for your time - Cameron

Salmon documentation:

--seqBias

Passing the --seqBias flag to Salmon will enable it to learn and correct for sequence-specific biases in the input data. Specifically, this model will attempt to correct for random hexamer priming bias, which results in the preferential sequencing of fragments starting with certain nucleotide motifs. By default, Salmon learns the sequence-specific bias parameters using 1,000,000 reads from the beginning of the input. If you wish to change the number of samples from which the model is learned, you can use the --numBiasSamples parameter. Salmon uses a variable-length Markov Model (VLMM) to model the sequence specific biases at both the 5’ and 3’ end of sequenced fragments. This methodology generally follows that of Roberts et al. [2], though some details of the VLMM differ.

Note: This sequence-specific bias model is substantially different from the bias-correction methodology that was used in Salmon versions prior to 0.6.0. This model specifically accounts for sequence-specific bias, and should not be prone to the over-fitting problem that was sometimes observed using the previous bias-correction methodology.

--gcBias

Passing the --gcBias flag to Salmon will enable it to learn and correct for fragment-level GC biases in the input data. Specifically, this model will attempt to correct for biases in how likely a sequence is to be observed based on its internal GC content.

You can use the FASTQC software followed by MultiQC with transcriptome GC distributions to check if your samples exhibit strong GC bias, i.e. under-representation of some sub-sequences of the transcriptome. If they do, we obviously recommend using the --gcBias flag. Or you can simply run Salmon with --gcBias in any case, as it does not impair quantification for samples without GC bias, it just takes a few more minutes per sample. For samples with moderate to high GC bias, correction for this bias at the fragment level has been shown to reduce isoform quantification errors [4] [3].

This bias is distinct from the primer biases learned with the --seqBias option. Though these biases are distinct, they are not completely independent. When both --seqBias and --gcBias are enabled, Salmon will learn a conditional fragment-GC bias model. By default, Salmon will learn 3 different fragment-GC bias models based on the GC content of the fragment start and end contexts, though this number of conditional models can be changed with the (hidden) option --conditionalGCBins. Likewise, the number of distinct fragment GC bins used to model the GC bias can be changed with the (hidden) option --numGCBins.

Note : In order to speed up the evaluation of the GC content of arbitrary fragments, Salmon pre-computes and stores the cumulative GC count for each transcript. This requires an extra 4-bytes per nucleotide. While this extra memory usage should normally be minor, it can nonetheless be controlled with the --reduceGCMemory option. This option replaces the per-nucleotide GC count with a rank-select capable bit vector, reducing the memory overhead from 4-bytes per nucleotide to ~1.25 bits, while being only marginally slower).

--posBias

Passing the --posBias flag to Salmon will enable modeling of a position-specific fragment start distribution. This is meant to model non-uniform coverage biases that are sometimes present in RNA-seq data (e.g. 5’ or 3’ positional bias). Currently, a small and fixed number of models are learned for different length classes of transcripts, as is done in Roberts et al. [2]. Note: The positional bias model is relatively new, and is still undergoing testing. It replaces the previous –useFSPD option, which is now deprecated. This feature should be considered as experimental in the current release.*

RNA-Seq Salmon DESeq2 Bias Flags • 3.8k views

ADD COMMENT • link 4.8 years ago by cameron.holman ▴ 20

3

Entering edit mode

I would go with --seqBias and --gcBias but to check me I'll tag Rob. Also make sure to update to Salmon 0.14.1 and use the new features.

ADD REPLY • link 4.8 years ago by Kristoffer Vitting-Seerup ★ 4.0k

1

Entering edit mode

I typically use (0.13.1):

salmon quant \
      -l A -i $IDX -p 8 \
      --no-version-check \
      --validateMappings \
      --maxMMPExtension 7 \
      --seqBias \
      --gcBias \
      -o out -1 in_1.fq.gz -2 in_2.fq.gz

I would not use experimental flags as the posBias one simply because it is not clear whether it turns out to be generally beneficial in most situations and might or might not become a best-practice option.

ADD REPLY • link 4.8 years ago by ATpoint 81k

3

Entering edit mode

It's helpful to look at how the authors of Salmon run Salmon. Sometimes they use --seqBias --gcBias --posBias --validateMappings (see here) and sometimes they use only --gcBias (see here). It really depends.

ADD REPLY • link updated 4.8 years ago by Ram 43k • written 4.8 years ago by Lior Pachter ▴ 700

0

Entering edit mode

Thanks very much for the links. Any tips on ways to run Kallisto or work with the output to help account for these biases? I've actually been running both Kallisto and Salmon side by side for more confidence in the results.

Also, thanks very much for making Kallisto, it is what I used to start teaching myself bioinformatics and definitely changed the game. Still using it all the time and appreciate the immense effort that was put into it.

ADD REPLY • link 4.8 years ago by cameron.holman ▴ 20

0

Entering edit mode

Thanks for the tips everyone. I'll review more to see if It might be useful for me to run --posBias with the data I have. I ran one with all flags last night and will run now without --posBias to see what differences come up.

Here's how I've run it (currently on 0.13.1 but will update)

for i in "${CTX[@]}";
do
echo ${i}
salmon quant -i mm10_index_salmon -l IU \
        -1 ${fd}/${i}_1_sequence.fq.gz \
        -2 ${fd}/${i}_2_sequence.fq.gz \
        -p 16 \
        --validateMappings \
        --seqBias \
        --gcBias \
        --posBias \
        -o /mnt/z/out/${i}
done

ADD REPLY • link 4.8 years ago by cameron.holman ▴ 20