Hello everyone!
I used the following code to perform differential metabolite enrichment analysis.
library(clusterProfiler)
library(KEGGREST)
metabolite_list <- c("C00031", "C00022", "C00186", "C00042", "C00149")
KEGG_COMPOUND_ID <- unique(df_sig_comp$KEGG_COMPOUND_ID)
kegg_compound2pathway <- KEGGREST::keggLink("pathway", "compound")
kegg_pathway2compound <- split(names(kegg_compound2pathway),
kegg_compound2pathway)
kegg_pathway2compound_stack <- stack(kegg_pathway2compound)[, 2:1] |>
mutate(values=str_replace(values,"cpd:",""))
enrich_res <- clusterProfiler::enricher(
gene = metabolite_list,
TERM2GENE =kegg_pathway2compound_stack ,
pvalueCutoff = 0.05,
qvalueCutoff = 0.2
)
head(summary(enrich_res))
dotplot(enrich_res, showCategory = 10)
The results are as follows:
ID Description GeneRatio BgRatio RichFactor FoldEnrichment zScore pvalue p.adjust qvalue
path:map04922 path:map04922 path:map04922 5/5 26/6589 0.19230769 253.42308 35.53705 6.365563e-13 5.156106e-11 9.380830e-12
path:map05230 path:map05230 path:map05230 5/5 37/6589 0.13513514 178.08108 29.76480 4.218197e-12 1.708370e-10 3.108145e-11
path:map00620 path:map00620 path:map00620 4/5 32/6589 0.12500000 164.72500 25.58317 2.283698e-09 6.165984e-08 1.121816e-08
path:map02020 path:map02020 path:map02020 4/5 56/6589 0.07142857 94.12857 19.28579 2.325710e-08 4.709563e-07 8.568405e-08
path:map04066 path:map04066 path:map04066 3/5 15/6589 0.20000000 263.56000 28.05280 9.521691e-08 1.542514e-06 2.806393e-07
path:map00020 path:map00020 path:map00020 3/5 20/6589 0.15000000 197.67000 24.27283 2.382935e-07 3.216962e-06 5.852823e-07
geneID Count
path:map04922 C00031/C00022/C00186/C00042/C00149 5
path:map05230 C00031/C00022/C00186/C00042/C00149 5
path:map00620 C00022/C00186/C00042/C00149 4
path:map02020 C00031/C00022/C00042/C00149 4
path:map04066 C00031/C00022/C00186 3
path:map00020 C00022/C00042/C00149 3
The content of kegg_pathway2compound_stack
is as follows:
> head(kegg_pathway2compound_stack)
ind values
1 path:map00010 C00022
2 path:map00010 C00024
3 path:map00010 C00031
4 path:map00010 C00033
5 path:map00010 C00036
6 path:map00010 C00068
I noticed that the URL used by KEGGREST::keggLink("pathway", "compound")
is https://rest.kegg.jp/link/compound/pathway
path:map00010 cpd:C00022
path:map00010 cpd:C00024
path:map00010 cpd:C00031
path:map00010 cpd:C00033
path:map00010 cpd:C00036
> head(kegg_compound2pathway)
cpd:C00022 cpd:C00024 cpd:C00031 cpd:C00033 cpd:C00036 cpd:C00068
"path:map00010" "path:map00010" "path:map00010" "path:map00010" "path:map00010" "path:map00010"
The genes for pathway Glycerophospholipid metabolism are different in different species, so are the metabolites in kegg species specific?
The species-specific background gene set for differentially expressed genes can be obtained from the interface below. How should the background gene set for metabolites be selected?