Question: What is the importance of the rlog function (DESeq2) for downstream analysis?
gravatar for Sam
3 months ago by
Sam30 wrote:

DESeq2 vignette states

The point of these two transformations, the VST and the rlog, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low.

and the documentation of rlog explains

The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis

I understand that "checking for outliers" means checking for outliers via a PCA plot (or something similar).

Why is minimizing differences (between samples) for rows with low counts important for the PCA plot?

Why does the variance have to be independent of the mean (homoscedasticity) for that?

deseq2 • 151 views
ADD COMMENTlink modified 3 months ago by i.sudbery9.1k • written 3 months ago by Sam30
gravatar for i.sudbery
3 months ago by
Sheffield, UK
i.sudbery9.1k wrote:

In variance based analyses, like PCA, clustering and LDA, the results are driven by those features with the highest variance. In count-based data, like RNA-seq, there is a relationship between the mean and the variance - higher mean = higher variance when the data is on the linear scale.If you were to run something like PCA on the linear scale, you would simply find that the result was dominated by the random noise in the high mean features.

However, because of the discrete nature of the data, on the log scale, the variance is higher in the low count features: 2 reads is twice as many as 1, and 1 read is infinitely more reads than 0, but 101 reads is only 1% more than 100. Thus, without some kind regularization, your PCA will be dominated by very small changes in low count features.

ADD COMMENTlink written 3 months ago by i.sudbery9.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1668 users visited in the last hour