12 weeks ago by

Republic of Ireland

The difference is subtle but means that `vst()`

can perform the transformation quicker.

`vst()`

is, in fact, a wrapper function of `varianceStabilizingTransformation()`

- it (vst) first identifies 1000 variables that are 'representative' of the dataset's dispersion trend, and uses the information from these to perform the transformation.

The key parameter in question is:

```
vst(..., nsub = 1000)
```

## ------------------

There is also a difference relating to the usage of `blind`

:

## vst

This is a wrapper for the varianceStabilizingTransformation (VST) that
provides much faster estimation of the dispersion trend used to
determine the formula for the VST. The speed-up is accomplished by
subsetting to a smaller number of genes in order to estimate this
dispersion trend. The subset of genes is chosen deterministically, to
span the range of genes' mean normalized count. **This wrapper for the**
**VST is not blind to the experimental design**: the sample covariate
information is used to estimate the global trend of genes' dispersion
values over the genes' mean normalized count. **It can be made strictly**
**blind to experimental design by first assigning a design of ~1 before**
**running this function, or by avoiding subsetting and using**
**varianceStabilizingTransformation.**

However, if you set `blind = TRUE`

for `vst()`

, it seems to set the design to `~ 1`

for you:

```
function (object, blind = TRUE, nsub = 1000, fitType = "parametric")
{
...
if (blind) {
design(object) <- ~1
}
matrixIn <- FALSE
...
vsd <- varianceStabilizingTransformation(object, blind = FALSE)
...
}
```

## varianceStabilizingTransformation

This function calculates a variance stabilizing transformation (VST)
from the fitted dispersion-mean relation(s) and then transforms the
count data (normalized by division by the size factors or
normalization factors), yielding a matrix of values which are now
approximately homoskedastic (having constant variance along the range
of mean values). The transformation also normalizes with respect to
library size. The rlog is less sensitive to size factors, which can be
an issue when size factors vary widely. These transformations are
useful when checking for outliers or as input for machine learning
techniques such as clustering or linear discriminant analysis.

Kevin