Question

DESeq2: vst() and varianceStabilizingTransformation()

4

Entering edit mode

3.6 years ago

lenC_biotecLover ▴ 90

Hi everyone, I'm exploring the DESeq2 package, in particular the varianceStabilizingTransformation function. I can't completely understand the differences between this and the vst function: when should I use them, and why should I prefer one or the other? Thank you

RNA-Seq sva R normalization • 12k views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe 87k • written 3.6 years ago by lenC_biotecLover ▴ 90

score 9 · Answer 1 · 2020-09-02

The difference is subtle but means that vst() can perform the transformation quicker.

vst() is, in fact, a wrapper function of varianceStabilizingTransformation() - it (vst) first identifies 1000 variables that are 'representative' of the dataset's dispersion trend, and uses the information from these to perform the transformation.

The key parameter in question is:

vst(..., nsub = 1000)

------------------

There is also a difference relating to the usage of blind:

vst

This is a wrapper for the varianceStabilizingTransformation (VST) that provides much faster estimation of the dispersion trend used to determine the formula for the VST. The speed-up is accomplished by subsetting to a smaller number of genes in order to estimate this dispersion trend. The subset of genes is chosen deterministically, to span the range of genes' mean normalized count. This wrapper for the VST is not blind to the experimental design: the sample covariate information is used to estimate the global trend of genes' dispersion values over the genes' mean normalized count. It can be made strictly blind to experimental design by first assigning a design of ~1 before running this function, or by avoiding subsetting and using varianceStabilizingTransformation.

However, if you set blind = TRUE for vst(), it seems to set the design to ~ 1 for you:

function (object, blind = TRUE, nsub = 1000, fitType = "parametric") 
{
    ...
        if (blind) {
            design(object) <- ~1
        }
        matrixIn <- FALSE
    ...
    vsd <- varianceStabilizingTransformation(object, blind = FALSE)
    ...
}

varianceStabilizingTransformation

This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors), yielding a matrix of values which are now approximately homoskedastic (having constant variance along the range of mean values). The transformation also normalizes with respect to library size. The rlog is less sensitive to size factors, which can be an issue when size factors vary widely. These transformations are useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis.

Kevin