Can I use pseudocounts for differential abundance of OTUs?
8 months ago
dpc ▴ 170

Hi !!! I am working with WGS metagenome data profiled by MetaPhlAn software which gives relative abundance of taxa (no read count data of the taxa is provided). Now, I want to find out which are the significantly differentially present taxa between the test and the control samples with DESEQ package. So, my question is should I multiply the relative abundance data with some constant (e.g. 1 Million) converting the relative abundance to pseudo counts for Deseq analysis? Will it give correct result?

8 months ago
antonioggsousa ★ 2.0k

I don't think that you can do it, because DESeq2 uses raw counts or estimated counts: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#input-data

Do you have any idea how can I find out (statistically) the differentially abundant taxa from metaphlan output?

I never use it myself, but LEFSe seems to do what you're interested in. It uses relative abundances to find differential abundant features (OTUs/taxa/genes) ranked by effect size: https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-6-r60

Actually this method was developed by the same lab that developed metaphlan, so I think they are compatible, but as I said I never used them, therefore I can't be sure.

Thanks, Antonio. Yes, I have already used it. I asked just to have a cross-check my data with another method if available.