Question: Could I apply limma to metabolon concentrations?
gravatar for shangyuan5000
4 months ago by
shangyuan500030 wrote:


I have a metabolon dataset with multiple factors(Gender, Genotypes, Treatment) and even the interactions between different factors. I tried to apply MetaboAnalyst and found that it's hard to model my question in MetaboAnalyst which does not support GLM modeling.

I used to use DESeq2/Limma to analyze my RNAseq data with similar experiment designs, and it was very successful. My question is could I apply those two packages in my metabolon analysis? Is there anything I need to pay special attention before going to those steps?

If I wanted to apply Camera/limma, could I use it the way similar to that in RNA-Seq?

Thanks & Best regards,

rna-seq • 319 views
ADD COMMENTlink modified 4 months ago by Kevin Blighe44k • written 4 months ago by shangyuan500030
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe44k
Kevin Blighe44k wrote:

The distribution of the metabolomics data that you received may not be on the scale/distribution expected by limma. Limma fits a linear regression model. On which distribution is your metabolomics data? For limma, the distribution should follow the normal distribution (Edit: for linear regression, the assumption is actually that your residuals are normally-distributed):

Here is the QC that I applied to my metabolomics study data (from metabolon):

**Metabolomics quality control**

 1. Start with instrument-produced relative abundance metabolite levels

 2. Remove metabolites if:
 - Level in QC samples has coefficient of variation (CoV) > 25%
 - Missingness > 10% across test samples
 - No variability across test samples based on interquartile range (IQR)

 3. Remove samples with metabolite missingness > 10%

 4. Filter out unidentified/unknown metabolites and those classified as
    xenobiotic chemicals

 5. Convert NA values to 0

After that, the data was log-transformed and then converted to Z scale. With the Z-scale data, I performed bootstrapped unbiased clustering (or 'machine learning', if I followed trends). With the Z-scale data, you could perform various other tests. I would prefer to just fit my own linear model via lm().

I have worked with various metabolomics datasets and know a few people associated with Metabolon.


ADD COMMENTlink modified 11 weeks ago • written 4 months ago by Kevin Blighe44k

That's awesome, Kevin! Thanks for your kind suggestion. I'm using a targeted metabolite database(Only ~800 metabolites), so I only remove those metabolites with missingness > 50%. I also do the "Sample-wise normalization, LogTransformation, and Autoscaling (Z-scale)", and check the overall data distribution looks "Normal". I followed the tutorials in MetaboAnalyst, which only support T-test(One factor, 2 levels) or ANOVA(One factor, >3 levels, at least 3 replicates/level).
I tried "MSEA (Molecular Set Enrichment Analysis", it seems that you could not define your own "metabolite set". Do you have any idea about how to do this in the metabolomics field?

Best regards, Raymond

ADD REPLYlink written 4 months ago by shangyuan500030

Cool! With your data (on the Z scale) you can perform your own tests in R or STATA. For example, if you have an outcome variable, like Case-Control, then you can perform a binary logistic regression:

summary(glm(CaseControl ~ metabolyte1, family = binomial(link = 'logit')))
summary(glm(CaseControl ~ metabolyte2, family = binomial(link = 'logit')))
*et cetera*

Have you used regression models in the past?

You can also just use the Student's t-test (t.test()), ANOVA (aov()), et cetera Some tutorials for ANOVA, here:

In MetaboAnalyst, you can also use the KEGG pathway analysis tool, no? -

What is the ultimate aim of your project?

ADD REPLYlink written 4 months ago by Kevin Blighe44k

Thanks, Kevin. I used the regression models in my homework before:_0. I can use the KEGG pathway, but less than 50% of my metabolites could map a KEGG id, the mapping rate is too low. The ultimate aim is to study the potential effects of between two drugs, and we want to test whether there are any indications from metabolites. We had a small sample size, unbalanced experimental design (and potential confounding factors effects because of this unbalanced design), that's why I wanted to use a regression model, trying to separate the confounding factor.

ADD REPLYlink written 4 months ago by shangyuan500030

I see. So, the models would be:

glm(drug ~ metabolite1)
glm(drug ~ metabolite2)
... ...

When you identify key metabolites with p<0.05, you can then create a final model and derive AUC (from ROC analysis).

final <- glm(drug ~ metabolite1 + metabolite5 + metabolite16)

You should also perform cross validation on the final model with cv.glm() (from boot package)

If you need help, I have a R package that can run the models:

ADD REPLYlink written 4 months ago by Kevin Blighe44k

Cool R Packages! Kevin! I did not make myself clear, but I think i get your point: final <- glm(metabolite ~ drug1+ drug2 + drug1:drug2). Thanks for the fruitful discussions.

ADD REPLYlink written 4 months ago by shangyuan500030
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1596 users visited in the last hour