I have metagenome data. I aggregate raw counts per KO × sample. I want to do differential abundance between two time groups using DESeq2. After that, I want to show abundance heatmaps and volcanot plot.
My first question is about DESeq analysis. Is it valid to use DESeq2 on KO-level metagenome counts? I don’t have a “true control,” so I plan to set one group as the reference and interpret log2FC relative to that.
Second, is it acceptable to plot a heatmap using VST-transformed values?
Alternatively, I could take the top 50 significant KOs from DESeq2, extract their CPM values, and plot a CPM heatmap, but I expect the visual patterns to differ a bit because VST and CPM are different scales.
Your proposed workflow is indeed sound, as applying DESeq2 to KO-counts is perfectly valid; its underlying negative binomial model is methodologically appropriate for modelling the overdispersion characteristic of any aggregated count data. The absence of a 'true control', a standard scenario for time-series data, is elegantly addressed by designating one time point as the reference level in your design formula, which ensures all log2FC values will correctly represent changes relative to this baseline.
With regard to visualisation, it is crucial to use the VST-transformed data for your heatmap, since unlike CPM which only normalises for sequencing depth, the VST stabilises variance across the full range of mean values, thereby ensuring the heatmap reflects genuine biological patterns rather than artefacts driven by dominant KOs.
Thank you but sorry, your answer sounds very AI-based...