How to transform my proportional (relative abundance) microbiome data before for statistical analysis?
Entering edit mode
4 months ago

Hi- I have analyzed metagenomic (WGS) data with MetaPhlAn pipeline which gives relative abundance (out of 100) data of each taxon. I have two groups of data: control and test. I want to find out the Mean, Standard Error (SE), sample number (N) of the control, and test group. My data is not normally distributed and for that I want it to be log transformed. For that, I have used the following function and transformed my dataset:

mk_logit <- function(x) log(x)

But, as my dataset is zero-inflated, all of the zeros (0) log-transformed into -Inf. When they were used for further mean, SD calculation, most of them are producing NaN and Inf. As a result, I am not getting proper result. Can anyone please give me any solution/suggestion in order to get rid of this problem?


R • 366 views
Entering edit mode

You have what's called compositional data. Compositional data needs specific treatment as detailed in the book Statistical analysis of compositional data by John Aitchison. In short, to be able to use standard methods, one needs to preprocess the the data with the additive log-ratio transformation. Instead of the standard logarithm, you can use a generalized logarithm function such as the inverse hyperbolic sine (asinh in R) to deal with 0s. You may want to read the paper Microbiome Datasets Are Compositional: And This Is Not Optional.

Entering edit mode

Thanks a lot, Jean-Karim Heriche for your response. I will take a look into the article.


Login before adding your answer.

Traffic: 1025 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6