Question

Variance versus coefficient of variation for selecting most variably expressed genes after variance stabilizing transformation?

1

Entering edit mode

7.4 years ago

bgrande ▴ 10

I have gene expression counts for dozens of samples and I would like to perform some unsupervised data exploration. As a starting point, I decided to go with a heatmap, but if you have any other recommendations, let me know. I've run DESeq2's variance stabilizing transformation (vst function), which includes an adjustment for library size and is a recommended step in DESeq2's vignette before plotting.

Typically, only the most variably expressed genes are plotted in a heatmap, as they are the most informative. The question then becomes, how do you quantify variability? Variance is an obvious choice, but I was afraid that this would be biased towards highly expressed genes, which will have higher absolute values for variance. I thought that normalizing the variance using the mean (like with the coefficient of variation) would be minimize this bias.

But then, I realized that DESeq2's vst function might obviate the need to normalize the variance by the mean. In other words, because vst tries to eliminate the relationship between the mean and variance, I should no longer be worried that variance alone will be biased towards more highly expressed genes. Am I correct in thinking this?

RNA-Seq deseq2 • 2.6k views

ADD COMMENT • link 7.4 years ago by bgrande ▴ 10

0

Entering edit mode

hello, have you found out how to quantify the variation?

ADD REPLY • link 7.1 years ago by User000 ▴ 690

0

Entering edit mode

I've been using variance to select the most variably expressed genes. Although, I would still like confirmation from others that this is the right approach.

ADD REPLY • link 7.1 years ago by bgrande ▴ 10