Question: how to find metabolites
0
gravatar for Learner
3 months ago by
Learner 100
Learner 100 wrote:

I have a list of genes and I want to know which metabolites are assigned to those genes. is there anyone who can tell me what are the options? I prefer to have access to a database rather than web app so feel free to comment on any package or script etc

genomics metabolomics • 261 views
ADD COMMENTlink modified 9 weeks ago • written 3 months ago by Learner 100

Check section 4.5.2 here: https://bioconductor.org/packages/release/bioc/vignettes/piano/inst/doc/piano-vignette.pdf and reference 6 therein. They describe making 'gene sets' where each gene set is a metabolite, and the genes within each set are the enzymes related to metabolism of that metabolite. I'm interested in this as well, let me know how it goes for you.

ADD REPLYlink written 3 months ago by ivanarg240

Apart from what appears to be a good answer by ivanarg2, you may also find the functions of MetaboAnalyst useful. Certain functions can work with genes.

Kevin

ADD REPLYlink written 3 months ago by Kevin Blighe28k
3
gravatar for ivanarg2
3 months ago by
ivanarg240
ivanarg240 wrote:

Based on this, with some modifications:

https://support.bioconductor.org/p/107927/

library(KEGGREST)
library(org.Hs.eg.db)
library(annotate)

## Get enzyme-gene annotations
res1 = keggLink("enzyme", "hsa")
tmpDF1 = data.frame(ec = res1, gene = names(res1))

## Get compound-enzyme annotations
res2 = keggLink("compound", "enzyme")
tmpDF2 = data.frame(cpd = res2, ec = names(res2))

## Merge
df = merge(tmpDF1, tmpDF2, by="ec")
## aggDF = aggregate(df['ec'], by = df['gene'], FUN=paste) ## Optional, see original link

## Convert KEGG gene IDs to Entrez Gene IDs
convs = keggConv("hsa", "ncbi-geneid")
names(convs) = as.character(gsub("ncbi-geneid:", "", names(convs)))
df$ncbi_id = names(convs)[match(df$gene, as.character(convs))]
df$ncbi_name = getSYMBOL(df$ncbi_id, "org.Hs.eg.db")

## Convert compound IDs to compound (metabolite) names
mets = keggList("compound")
mets = setNames(str_split(mets, ";", simplify = T)[,1], names(mets))
df$cpd_name = as.character(mets)[match(df$cpd, names(mets))]

## Make a list of metabolite gene sets. Each metabolite gene set is composed of enzymes/genes involved in their metabolism
metabolites.gs = lapply(1:length(unique(df$cpd_name)), function(x) df$ncbi_name[which(df$cpd_name == unique(df$cpd_name)[x])])
namesmetabolites.gs) = unique(df$cpd_name)
metabolites.gs$'D-Glyceraldehyde 3-phosphate' ## test

## Taking your df from your original post, create ranks.
ranks = setNames(df$logFC, df$X)

## Now you have both the gene sets and the ranks. You can run GSEA
res = fgseametabolites.gs, ranks, nperm = 1000)

Note: depending on the output from your differential expression analysis, I would use (i) the t-statistic, which is simply logFC / sd(logFC); (ii) the logFC, as I show above; or (iii) separate your df into 'up' or 'down' regulated, and use the p-values. I'm not an expert on this so I'm not sure which option is best, but you can play around with different analyses.

EDIT: For some reason my parenthesis are not showing: the proper code is "names(metabolites.gs)" and "fgsea(metabolites.gs)", without the asterisks.

ADD COMMENTlink modified 3 months ago • written 3 months ago by ivanarg240

@ivanarg2 thanks for your help. how then I can use my gene list to find the metabolites?

ADD REPLYlink written 3 months ago by Learner 100

You can then run regular gene set enrichment analysis, as below:

fgseametabolites.gs, ranks, nperm = 1000) ## ranks is from your differential expression analysis, i.e., a t-statistic (logFC/logFC.sd)

The resulting list of 'pathways' (really, metabolites), will tell you the enrichment of metabolites in different conditions. Is this what you were asking?

ADD REPLYlink modified 3 months ago • written 3 months ago by ivanarg240

@ivanarg2 here is what I undestood. You extract the gene and metabolits assigned to them from KEGG. Then I have a list of genes which are upregulated and I already did the differential experission , so I have their names. I want to just see what metabolites are upregulated with the list of the genes that I have . Is that clear?

ADD REPLYlink written 3 months ago by Learner 100

Right. So you already have the two necessary parameters to run GSEA: (i) a list of gene sets (which usually refers to a biological pathway and the genes associated with those pathways, but in this case the gene sets refer to metabolites, and the genes involved in their metabolism); and (ii) a ranking of genes based on their differential expression. Just run fgsea on your list of differentially expressed genes and metabolites.gs.

ADD REPLYlink written 3 months ago by ivanarg240

@ivanarg2 Can you help me to apply it on this list ? "GPX3", "GLRX", "LBP", "CRYAB", "DEFB1", "HCLS1", "SOD2", "HSPA2", "ORM1", "IGFBP1", "PTHLH", "GPC3", "IGFBP3","TOB1", "MITF", "NDRG1", "NR1H4", "FGFR3", "PVR", "IL6", "PTPRM", "ERBB2", "NID2", "LAMB1", "COMP", "PLS3", "MCAM", "SPP1", "LAMC1", "COL4A2", "COL4A1", "MYOC", "ANXA4", "TFPI2", "CST6", "SLPI", "TIMP2", "CPM", "GGT1", "NNMT", "MAL", "EEF1A2", "HGD", "TCN2", "CDA", "PCCA", "CRYM", "PDXK","STC1", "WARS", "HMOX1", "FXYD2", "RBP4", "SLC6A12", "KDELR3", "ITM2B"

ADD REPLYlink modified 3 months ago by genomax55k • written 3 months ago by Learner 100

To run GSEA you need a vector with (i) gene names, as you provide, and (ii) some kind of ranking variable, like a fold change, a p value, or something like that. Do you have this info from your differential expression analysis?

ADD REPLYlink modified 3 months ago • written 3 months ago by ivanarg240

@ivanarg2 Please look at my question. I provided the data there in a R format

ADD REPLYlink modified 3 months ago • written 3 months ago by Learner 100

I edited my answer based on your edit.

ADD REPLYlink written 3 months ago by ivanarg240

@ivanarg2 I did not check it yet but I accepted it. however, you have few df in your code which maybe should be named differently otherwise it will mess up the analysis!

ADD REPLYlink written 3 months ago by Learner 100

@ivanarg2 do you know anyway to plot the metabolic pathways found by your strategy?

ADD REPLYlink written 3 months ago by Learner 100

The fgsea package has a few good functions for plotting the results of fgsea analysis

ADD REPLYlink written 12 weeks ago by ivanarg240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 649 users visited in the last hour