Question: Questions regarding VST/rlog correction to plot sample distances between RNA-seq samples
0
gravatar for salamandra
3 months ago by
salamandra170
salamandra170 wrote:

I would like to get sample distances between different samples of an RNA-seq experiment. Read that VST and rlog function of DEseq R package were good to make a correction so that standard deviation of expression of a gene across all samples doesn't change with the mean (of expression of that gene across all samples). My questions are:

1 - Should these corrections be applied after normalising raw counts for sequencing depth (with the DESeq() function) or directly applied on the raw data?

2 - To do a heatmap with a dendrogram representing the distances between samples, is it better to plot in a heatmap the values corrected with VST/rlog or FPKM values?

3 - 'VST' method seems to be better for big sets (n>30). I have 3 samples, so that means need to choose 'rlog' instead?

4 - In both methods we can set parameter 'blind'. Should I set it to 'TRUE' or 'FALSE' in which situations?

Regards.

rna-seq deseq R • 267 views
ADD COMMENTlink modified 3 months ago by Kevin Blighe30k • written 3 months ago by salamandra170
3
gravatar for Kevin Blighe
3 months ago by
Kevin Blighe30k
Kevin Blighe30k wrote:

1 - Should these corrections be applied after normalising raw counts for sequencing depth (with the DESeq() function) or directly applied on the raw data?

These should only be applied to the normalised counts, as per the DESeq2 vignette.

2 - To do a heatmap with a dendrogram representing the distances between samples, is it better to plot in a heatmap the values corrected with VST/rlog or FPKM values?

Don't use FPKM values - that method of normalisation should be no longer used for multi-sample studies. Instead, use either the VST- or rlog-transformed counts. Please see the answer that I gave earlier today: A: How to graphically tell if data has been normalized?

3 - 'VST' method seems to be better for big sets (n>30). I have 3 samples, so that means need to choose 'rlog' instead?

You can justify the use of either. rlog is not recommended for large datasets because it can take a very long time. I tend to check both, where possible, and find that results don't largely change between both of these methods (provided that there are no outliers in your dataset).

4 - In both methods we can set parameter 'blind'. Should I set it to 'TRUE' or 'FALSE' in which situations?

Please take a look at my answer and comments here: C: Order of operations in RNAseq analysis

Kevin

ADD COMMENTlink written 3 months ago by Kevin Blighe30k

Thank you. Noticed now that, if we apply VST on the raw values :

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, design= ~ condition)
vsd <- vst(ddsHTSeq, blind = FALSE)
head(assay(vsd), 3)

gives the same results when applying to normalised values:

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, design= ~ condition)
ddsHTSeq <- DESeq(ddsHTSeq)
vsd <- vst(ddsHTSeq, blind = FALSE)
head(assay(vsd), 3)

And in section 4.2. of this tutorial it seems it's applied to data before DESeq() is applied, so maybe it does not matter if it's normalised or not?

ADD REPLYlink written 3 months ago by salamandra170
1

I see - good observation! Though, if they don't exist, vst() still calculates dispersion estimates and size factors on the fly:

if (is.null(sizeFactors(object)) & is.null(normalizationFactors(object))) {
    object <- estimateSizeFactors(object)
}

if (blind) {
    design(object) <- ~ 1
}

if (blind | is.null(attr(dispersionFunction(object),"fitType"))) {
    object <- estimateDispersionsGeneEst(object, quiet=TRUE)
    object <- estimateDispersionsFit(object, quiet=TRUE, fitType)
}

[from: https://github.com/mikelove/DESeq2/blob/master/R/vst.R]

One can actually apply these transformations to any type of numerical data. I've used rlog many times for non-DESeq2 relatd matrices.

ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 773 users visited in the last hour