Question: Variance versus coefficient of variation for selecting most variably expressed genes after variance stabilizing transformation?
gravatar for bgrande
3.8 years ago by
bgrande0 wrote:

I have gene expression counts for dozens of samples and I would like to perform some unsupervised data exploration. As a starting point, I decided to go with a heatmap, but if you have any other recommendations, let me know. I've run DESeq2's variance stabilizing transformation (vst function), which includes an adjustment for library size and is a recommended step in DESeq2's vignette before plotting.

Typically, only the most variably expressed genes are plotted in a heatmap, as they are the most informative. The question then becomes, how do you quantify variability? Variance is an obvious choice, but I was afraid that this would be biased towards highly expressed genes, which will have higher absolute values for variance. I thought that normalizing the variance using the mean (like with the coefficient of variation) would be minimize this bias.

But then, I realized that DESeq2's vst function might obviate the need to normalize the variance by the mean. In other words, because vst tries to eliminate the relationship between the mean and variance, I should no longer be worried that variance alone will be biased towards more highly expressed genes. Am I correct in thinking this?

rna-seq deseq2 • 1.5k views
ADD COMMENTlink written 3.8 years ago by bgrande0

hello, have you found out how to quantify the variation?

ADD REPLYlink written 3.5 years ago by User000390

I've been using variance to select the most variably expressed genes. Although, I would still like confirmation from others that this is the right approach.

ADD REPLYlink written 3.5 years ago by bgrande0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour