DESeq2 on metagenome KO counts
2
0
Entering edit mode
1 day ago

Hi all,

I have metagenome data. I aggregate raw counts per KO × sample. I want to do differential abundance between two time groups using DESeq2. After that, I want to show abundance heatmaps and volcanot plot.

My first question is about DESeq analysis. Is it valid to use DESeq2 on KO-level metagenome counts? I don’t have a “true control,” so I plan to set one group as the reference and interpret log2FC relative to that.

Second, is it acceptable to plot a heatmap using VST-transformed values? Alternatively, I could take the top 50 significant KOs from DESeq2, extract their CPM values, and plot a CPM heatmap, but I expect the visual patterns to differ a bit because VST and CPM are different scales.

Thank you very much.

abundance KEGG KO deseq metagenome gene • 287 views
ADD COMMENT
0
Entering edit mode

The DESeq2 developer has advised many times against DESeq2 for metagenomics. Please search for related posts over at support.bioconductor.org where he advised for alternatives.

ADD REPLY
2
Entering edit mode
6 hours ago

I wouldn’t use DESeq2 on aggregated raw counts per KO, because doing so means accepting two pretty big assumptions:

  1. All genes with the same KO have the same length, which usually isn’t true.
  2. KOs make up a small part of the total coding sequence. When DESeq2 normalises for sequencing depth, it’s assuming that the proportion of CDS with a KOs stays roughly the same across samples. If that’s not the case, the normalisation might not work well.
ADD COMMENT
0
Entering edit mode
10 hours ago
Aleksandra ▴ 180

(I apologise for my previous reply; I wanted to be as detailed as possible) DESeq2 handles KO counts perfectly well. For a time-series design without a separate control group, simply set the reference level in the design formula to the first time point. For the heatmap, definitely use the VST-transformed values. The patterns will differ from CPM and the VST is what you want. This stabilises the variance, ensuring that the clustering isn't skewed by the most abundant KOs.

ADD COMMENT
0
Entering edit mode

Both vst or logcpm will favor expression level (magnitude of counts) rather than differences in a hclust/heatmap. For differences you need to scale counts first, be it vst or logcpm. See Scaling RNA-Seq data before clustering?

ADD REPLY

Login before adding your answer.

Traffic: 3454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6