Variance versus coefficient of variation for selecting most variably expressed genes after variance stabilizing transformation?
Entering edit mode
4.4 years ago
bgrande ▴ 10

I have gene expression counts for dozens of samples and I would like to perform some unsupervised data exploration. As a starting point, I decided to go with a heatmap, but if you have any other recommendations, let me know. I've run DESeq2's variance stabilizing transformation (vst function), which includes an adjustment for library size and is a recommended step in DESeq2's vignette before plotting.

Typically, only the most variably expressed genes are plotted in a heatmap, as they are the most informative. The question then becomes, how do you quantify variability? Variance is an obvious choice, but I was afraid that this would be biased towards highly expressed genes, which will have higher absolute values for variance. I thought that normalizing the variance using the mean (like with the coefficient of variation) would be minimize this bias.

But then, I realized that DESeq2's vst function might obviate the need to normalize the variance by the mean. In other words, because vst tries to eliminate the relationship between the mean and variance, I should no longer be worried that variance alone will be biased towards more highly expressed genes. Am I correct in thinking this?

RNA-Seq deseq2 • 1.7k views
Entering edit mode

hello, have you found out how to quantify the variation?

Entering edit mode

I've been using variance to select the most variably expressed genes. Although, I would still like confirmation from others that this is the right approach.


Login before adding your answer.

Traffic: 2121 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6