Question

why the metabolomics file does not merge?

0

Entering edit mode

2.5 years ago

Bioinfo ▴ 20

hello guys,

I am trying to get the metabolomics list but it seems like it does not merge , it returns an empty list. where and what and I am doing wrong?

    library(KEGGREST) 
    library(org.Hs.eg.db) 
    library(annotate)

    ## Get enzyme-gene annotations 
res1 = keggLink("enzyme", "hsa")
tmpDF1 = data.frame(ec = res1, gene = names(res1))

    ## Get compound-enzyme annotations 
res2 = keggLink("compound", "enzyme")
tmpDF2 = data.frame(cpd = res2, ec = names(res2))

    ## Merge 
df = merge(tmpDF1, tmpDF2, by="ec")


    ## Convert KEGG gene IDs to Entrez Gene IDs 
convs = keggConv("hsa", "ncbi-geneid") 
names(convs) = as.character(gsub("ncbi-geneid:", "", names(convs))) 
df$ncbi_id = names(convs)[match(df$gene, as.character(convs))] 
df$ncbi_name = getSYMBOL(df$ncbi_id, "org.Hs.eg.db")

    ## Convert compound IDs to compound (metabolite) names 
mets = keggList("compound") 
mets = setNames(str_split(mets, ";", simplify = T)[,1], names(mets)) 
df$cpd_name = as.character(mets)[match(df$cpd, names(mets))]

    ## Make a list of metabolite gene sets. Each metabolite gene set is composed of enzymes/genes involved in their metabolism metabolites.gs
    = lapply(1:length(unique(df$cpd_name)), function(x) df$ncbi_name[which(df$cpd_name == unique(df$cpd_name)[x])]) names(metabolites.gs) = unique(df$cpd_name)

R • 1.1k views

ADD COMMENT • link updated 2.5 years ago by rpolicastro 13k • written 2.5 years ago by Bioinfo ▴ 20

score 2 · Accepted Answer · 2023-04-21

2

Entering edit mode

2.5 years ago

rpolicastro 13k

After

mets = setNames(str_split(mets, ";", simplify = T)[,1], names(mets))

add the code

names(mets) <- paste0("cpd:", names(mets))

The names are missing "cpd:" so won't merge into the df you're building without them,

ADD COMMENT • link 2.5 years ago by rpolicastro 13k

0

Entering edit mode

rpolicastro how comes ???? I have been busy trying 100 things ! thanks

ADD REPLY • link 2.5 years ago by Bioinfo ▴ 20

0

Entering edit mode

No problem!

keggLink("compound", "enzyme") returns them in the format cpd:C00001 and keggList("compound") returns them in the format C00001, so it's just a matter of hormonizing the naming conventions between what the two functions return.

ADD REPLY • link 2.5 years ago by rpolicastro 13k