Question

DESeq2 on metagenome KO counts

0

Entering edit mode

1 day ago

young_bioinformatician ▴ 240

Hi all,

I have metagenome data. I aggregate raw counts per KO × sample. I want to do differential abundance between two time groups using DESeq2. After that, I want to show abundance heatmaps and volcanot plot.

My first question is about DESeq analysis. Is it valid to use DESeq2 on KO-level metagenome counts? I don’t have a “true control,” so I plan to set one group as the reference and interpret log2FC relative to that.

Second, is it acceptable to plot a heatmap using VST-transformed values? Alternatively, I could take the top 50 significant KOs from DESeq2, extract their CPM values, and plot a CPM heatmap, but I expect the visual patterns to differ a bit because VST and CPM are different scales.

Thank you very much.

abundance KEGG KO deseq metagenome gene • 287 views

ADD COMMENT • link updated 6 hours ago by ATpoint 89k • written 1 day ago by young_bioinformatician ▴ 240

0

Entering edit mode

The DESeq2 developer has advised many times against DESeq2 for metagenomics. Please search for related posts over at support.bioconductor.org where he advised for alternatives.

ADD REPLY • link 6 hours ago by ATpoint 89k

score 2 · Answer 1 · 2025-10-11

I wouldn’t use DESeq2 on aggregated raw counts per KO, because doing so means accepting two pretty big assumptions:

All genes with the same KO have the same length, which usually isn’t true.
KOs make up a small part of the total coding sequence. When DESeq2 normalises for sequencing depth, it’s assuming that the proportion of CDS with a KOs stays roughly the same across samples. If that’s not the case, the normalisation might not work well.

score 0 · Answer 2 · 2025-10-10

0

Entering edit mode

10 hours ago

Aleksandra ▴ 180

(I apologise for my previous reply; I wanted to be as detailed as possible) DESeq2 handles KO counts perfectly well. For a time-series design without a separate control group, simply set the reference level in the design formula to the first time point. For the heatmap, definitely use the VST-transformed values. The patterns will differ from CPM and the VST is what you want. This stabilises the variance, ensuring that the clustering isn't skewed by the most abundant KOs.

ADD COMMENT • link 10 hours ago by Aleksandra ▴ 180

0

Entering edit mode

Both vst or logcpm will favor expression level (magnitude of counts) rather than differences in a hclust/heatmap. For differences you need to scale counts first, be it vst or logcpm. See Scaling RNA-Seq data before clustering?

ADD REPLY • link 6 hours ago by ATpoint 89k