I have gene expression counts for dozens of samples and I would like to perform some unsupervised data exploration. As a starting point, I decided to go with a heatmap, but if you have any other recommendations, let me know. I've run DESeq2's variance stabilizing transformation (vst function), which includes an adjustment for library size and is a recommended step in DESeq2's vignette before plotting.
Typically, only the most variably expressed genes are plotted in a heatmap, as they are the most informative. The question then becomes, how do you quantify variability? Variance is an obvious choice, but I was afraid that this would be biased towards highly expressed genes, which will have higher absolute values for variance. I thought that normalizing the variance using the mean (like with the coefficient of variation) would be minimize this bias.
But then, I realized that DESeq2's vst function might obviate the need to normalize the variance by the mean. In other words, because vst tries to eliminate the relationship between the mean and variance, I should no longer be worried that variance alone will be biased towards more highly expressed genes. Am I correct in thinking this?
hello, have you found out how to quantify the variation?
I've been using variance to select the most variably expressed genes. Although, I would still like confirmation from others that this is the right approach.