Hi all,
I have metagenome data. I aggregate raw counts per KO × sample. I want to do differential abundance between two time groups using DESeq2. After that, I want to show abundance heatmaps and volcanot plot.
My first question is about DESeq analysis. Is it valid to use DESeq2 on KO-level metagenome counts? I don’t have a “true control,” so I plan to set one group as the reference and interpret log2FC relative to that.
Second, is it acceptable to plot a heatmap using VST-transformed values? Alternatively, I could take the top 50 significant KOs from DESeq2, extract their CPM values, and plot a CPM heatmap, but I expect the visual patterns to differ a bit because VST and CPM are different scales.
Thank you very much.
(I apologise for my previous reply; I wanted to be as detailed as possible) DESeq2 handles KO counts perfectly well. For a time-series design without a separate control group, simply set the reference level in the design formula to the first time point. For the heatmap, definitely use the VST-transformed values. The patterns will differ from CPM and the VST is what you want. This stabilises the variance, ensuring that the clustering isn't skewed by the most abundant KOs.
Both vst or logcpm will favor expression level (magnitude of counts) rather than differences in a hclust/heatmap. For differences you need to scale counts first, be it vst or logcpm. See Scaling RNA-Seq data before clustering?
Thank you very much. I mistakenly drew a superficial analogy. I equated KO counters with gene counters.
I focused on the fact that DESeq2 models overdispersion, but overlooked its implicit assumption: normalisation works when the main source of variation between samples is sequencing depth, rather than systematic variation in feature length.
This statement makes zero sense in response to what I wrote.
The DESeq2 developer has advised many times against DESeq2 for metagenomics. Please search for related posts over at support.bioconductor.org where he advised for alternatives.