Hi ,I’m new to metabolomics analysis.
I am studying differences in the human gut microbiome between control and treatment groups in a clinical trial. Fecal samples were analyzed by LC–MS/MS targeting 700+ metabolites, and 612 metabolites were detected. This is the matrix of metabolite content (ng/g).
I imported the matrix into MetaboAnalyst 6.0 for statistical analysis. During data filtering, I did not set a threshold, but some metabolites have more than 50% zero values. However, these missing/zero values were not flagged by the platform.
For normalization, my raw data’s median RSD is 4.9%, and 100% of features have RSD < 30%. I tried multiple combinations of:
- Normalization by sum
- Normalization by median
- Log2 transformation
- Auto scaling
- Mean centering
I then identified differential metabolites using |logFC| > 1 and VIP > 1. But the resulting significant metabolites and their ranking change a lot depending on the normalization method. Sometimes I want to compare my results with those provided by the company that did the LC–MS/MS analysis, but I cannot fully reproduce their results , some metabolites are always different.
My question is, How to decide which normalization method to use for targeted metabolomics data? Thanks!
Additionally, no matter which normalization method I use, my PCA results do not show clear group separation (PERMANOVA p > 0.7). For OPLS-DA, there is some visual separation, but the Q² values are low (maximum ~0.25, empirical p > 0.05).