I have metagenome data. I aggregate raw counts per KO × sample. I want to do differential abundance between two time groups using DESeq2. After that, I want to show abundance heatmaps and volcanot plot.
My first question is about DESeq analysis. Is it valid to use DESeq2 on KO-level metagenome counts? I don’t have a “true control,” so I plan to set one group as the reference and interpret log2FC relative to that.
Second, is it acceptable to plot a heatmap using VST-transformed values?
Alternatively, I could take the top 50 significant KOs from DESeq2, extract their CPM values, and plot a CPM heatmap, but I expect the visual patterns to differ a bit because VST and CPM are different scales.
The DESeq2 developer has advised many times against DESeq2 for metagenomics. Please search for related posts over at support.bioconductor.org where he advised for alternatives.
I wouldn’t use DESeq2 on aggregated raw counts per KO, because doing so means accepting two pretty big assumptions:
All genes with the same KO have the same length, which usually isn’t true.
KOs make up a small part of the total coding sequence. When DESeq2 normalises for sequencing depth, it’s assuming that the proportion of CDS with a KOs stays roughly the same across samples. If that’s not the case, the normalisation might not work well.
(I apologise for my previous reply; I wanted to be as detailed as possible)
DESeq2 handles KO counts perfectly well. For a time-series design without a separate control group, simply set the reference level in the design formula to the first time point.
For the heatmap, definitely use the VST-transformed values. The patterns will differ from CPM and the VST is what you want. This stabilises the variance, ensuring that the clustering isn't skewed by the most abundant KOs.
Both vst or logcpm will favor expression level (magnitude of counts) rather than differences in a hclust/heatmap. For differences you need to scale counts first, be it vst or logcpm. See Scaling RNA-Seq data before clustering?
The DESeq2 developer has advised many times against DESeq2 for metagenomics. Please search for related posts over at support.bioconductor.org where he advised for alternatives.