How to get a mapping between KEGG module and KEGG orthologs?
1
0
Entering edit mode
7 months ago
O.rka ▴ 390

I'm looking for the most simple of tables but it's difficult to find. A table of KEGG orthologs in the following format: [MODULE]\t[KO_1, KO_2, ..., KO_N]

I downloaded a weird formatted flat file from KEGG but for some of the KEGG modules there were other KEGG modules in the hierarchy (yes, I know KEGG is hierarchical) such as https://www.genome.jp/kegg-bin/show_module?M00615

Does anyone know where I can find this? I just need a very simple table for set comprehension.

KEGG Annotation Ortholog Module • 448 views
4
Entering edit mode
7 months ago
Elucidata ▴ 240

You can use R to construct the table. You would need to install the package and load it. You can use the code below to connect to the KEGG database, retrieve module information, map to get corresponding ortholog information, and construct the table.

#Install package to get relevant information from KEGG database

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("KEGGREST")

#Note: Uncomment the above code once the installation is successful

library(KEGGREST)

#Get list of modules in KEGG
mod <- keggList("module")

#Loop through each module and get corresponding orthologs
#Return module ID, corresponding list of orthologs
#obj is a list of list(moduleID,orthologs)
#[[moduleID1,orthologs list 1],[moduleID2,orthologs list 2].etc]

obj<-lapply(names(mod),function(x)
{
module<-strsplit(x,"md:")[[1]][2]
#Search for corresponding ortholog
ko<-keggGet(x)
#Save list of orthologs as a string separated by ","
orthologs<-paste(names(ko[[1]]\$ORTHOLOGY),collapse = ",")
list(module,orthologs)
})

#Convert list to dataframe
df<-do.call(rbind,obj)

#Name columns
colnames(df)<-c("Module","KO")

#Display first few entries in the table

#Save table to csv file
write.csv(df,path to file/filename.csv)

0
Entering edit mode

This is amazing! Thank you so much. It pretty much works like a charm. However, I noticed a few weird parts. Do you know why some of the KO descriptions are in there? For example, 11 M00011 K00164,K00658,K00382,K00174,K00175,K00177,K00176,K01902,K01903,K01899,K01900,K18118,K00234,K00235,K00236,K00237,K00239,K00240,K00241,K00242,K18859,K18860,K00244,K00245,K00246,K00247 fumarate reductase [EC:1.3.5.4] [RN:R02164],K01676,K01679,K01677+K01678,K00026,K00025,K00024,K00116. Also, is it possible to output what "version" of KEGG this for when I store the file? That could be useful for accessing this in the future.

0
Entering edit mode

Also, using this method there are only 443 modules. What happened to other modules such as "M00080"? I'm not seeing these on the KEGG website but seeing them in previous publications.