Question

DESeq2 on metagenome KO counts

0

Entering edit mode

7 weeks ago

young_bioinformatician ▴ 250

Hi all,

I have metagenome data. I aggregate raw counts per KO × sample. I want to do differential abundance between two time groups using DESeq2. After that, I want to show abundance heatmaps and volcanot plot.

My first question is about DESeq analysis. Is it valid to use DESeq2 on KO-level metagenome counts? I don’t have a “true control,” so I plan to set one group as the reference and interpret log2FC relative to that.

Second, is it acceptable to plot a heatmap using VST-transformed values? Alternatively, I could take the top 50 significant KOs from DESeq2, extract their CPM values, and plot a CPM heatmap, but I expect the visual patterns to differ a bit because VST and CPM are different scales.

Thank you very much.

abundance KEGG KO deseq metagenome gene • 1.8k views

ADD COMMENT • link updated 4 weeks ago by Aleksandra ▴ 190 • written 7 weeks ago by young_bioinformatician ▴ 250

2

Entering edit mode

The DESeq2 developer has advised many times against DESeq2 for metagenomics. Please search for related posts over at support.bioconductor.org where he advised for alternatives.

ADD REPLY • link 7 weeks ago by ATpoint 90k

0

Entering edit mode

Okay, I will take a look. But I have also come across a lot of recent publications where people are using DESeq2 for metagenomics. So I'm a little confused...

ADD REPLY • link 5 weeks ago by young_bioinformatician ▴ 250

1

Entering edit mode

So I'm a little confused...

Because he never tested DESeq2 for metagenomics and is not an expert in this field: https://support.bioconductor.org/p/128871/

There is no consensus in this field. Here are a couple of papers that may help you determine which methods work best for your data: 1) https://pmc.ncbi.nlm.nih.gov/articles/PMC10461514/; 2) https://pubmed.ncbi.nlm.nih.gov/32746888/

ADD REPLY • link 5 weeks ago by andres.firrincieli 3.9k

score 3 · Answer 1 · 2025-10-11

3

Entering edit mode

7 weeks ago

andres.firrincieli 3.9k

I wouldn’t use DESeq2 on aggregated raw counts per KO, because doing so means accepting two pretty big assumptions:

All genes with the same KO have the same length, which usually isn’t true.
KOs make up a small part of the total coding sequence. When DESeq2 normalises for sequencing depth, it’s assuming that the proportion of CDS with a KOs stays roughly the same across samples. If that’s not the case, the normalisation might not work well.

ADD COMMENT • link 7 weeks ago by andres.firrincieli 3.9k

0

Entering edit mode

Hım, that is indeed good point ! So, I’ll perform the DESeq2 at the gene level instead, and then map significant genes back to their corresponding KOs for interpretation.

ADD REPLY • link 5 weeks ago by young_bioinformatician ▴ 250