Question: R DESeq: What exactly is Variance Stabilizating Transformation?
gravatar for komal.rathi
5.0 years ago by
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

I have been using the DESeq VST method on gene counts produced by Htseq-count as follows:

cds <- newCountDataSet(countData = dat, conditions = factor(conditions))
cds <- estimateSizeFactors(cds)
cds <- estimateDispersions(cds, sharingMode = "gene-est-only", method = "pooled", fitType = "local")
vst <- getVarianceStabilizedData(cds)

But honestly, I do not understand what exactly the getVarianceStabilizedData() function does. Can someone explain in simple terms:

  1. Why is it necessary to normalize raw count data? Why can't we use the raw count data?
  2. How exactly are we normalizing the raw count data using getVarianceStabilizedData() function?
  3. Should the conditions parameter in the newCountDataSet() function match the conditions between which you want to find differentially expressed genes? For e.g. I have both cases & controls as well as males & females. So should I include both the information in the conditions parameter or just cases & controls?

I know these questions can be searched for easily, and I did. But I want a simple explanation from someone who uses these methods regularly to clear my concepts.

vst deseq R • 14k views
ADD COMMENTlink modified 21 months ago • written 5.0 years ago by komal.rathi3.4k

For question 1, do you mean in the sense of variance stabilization or in the sense of library size? Also, have you read the DESeq paper (and the DESeq2 preprint, since you should switch to DESeq2 if possible)?

ADD REPLYlink written 5.0 years ago by Devon Ryan91k

I meant in terms of both the stabilization & library size. I did not read the published paper but did read the Reference Manual and there is a paragraph explaining VST but there are statistical terms which are do not quite understand (like a gene's dispersion, Poisson noise etc). But I will look at the DESeq paper now that you have mentioned it. Thanks!

ADD REPLYlink written 5.0 years ago by komal.rathi3.4k

If you're familiar with terms like "variance" or "standard deviation" as well as what a Poisson distribution is, then at least those terms can be translated to something you're more familiar with. If not, then you'd be well served to just take a decent statistics class, since a lot of things will be pretty tough going otherwise.

ADD REPLYlink written 5.0 years ago by Devon Ryan91k
gravatar for Steve Lianoglou
5.0 years ago by
Steve Lianoglou5.0k
Steve Lianoglou5.0k wrote:

The vignettes in DESeq2 (which you should prefer using these days) describe these things and why you'd want to use them, under these sections:

The main image you want to have in your mind is the one in Figure 3 of the first link. The point of these transforms is to reduce (ideally eliminate) dependence of the variance on the mean. The second link above has this paragraph which sums things up quite nicely:

"""Many common statistical methods for exploratory analysis of multidimensional data, especially methods for clustering and ordination (e.g., principal-component analysis and the like), work best for (at least approximately) homoskedastic data; this means that the variance of an observable quantity (i.e., here, the expression strength of a gene) does not depend on the mean. In RNA-Seq data, however, variance grows with the mean."""

You should prefer to use these when you are doing downstream analysis on your count data that doesn't involve testing for differential expression using the statistical methods developed for count data. These scenarios include doing things like clustering, or PCA over your expression data or using the data as input to another machine learning algorithm.

Take the time to read the two vignettes above, as well as the DESeq2 preprint to get a better understanding of this (and many other things related to differential expression analysis with this software) ... the authors have gone to great lengths to document their software and methodology quite thoroughly.


ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Steve Lianoglou5.0k

Thanks. I will certainly do that.

ADD REPLYlink written 5.0 years ago by komal.rathi3.4k

how to get the rlog transformed data ? is there any thing like this vst <- getVarianceStabilizedData(cds) for rlog ?

ADD REPLYlink written 21 months ago by krushnach80570

I'm sure you've found the answer by now, but to take an rlog you can run: rlogTransformation(cds) or simply rlog(cds).

ADD REPLYlink written 19 months ago by shawn.w.foley830

okay..thank for the reply...I did use vst ...but i will try your's

ADD REPLYlink written 19 months ago by krushnach80570
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour