Question: How to retrieve the human metabolome from KEGG?
gravatar for Crabbit
4.1 years ago by
Crabbit0 wrote:


I am new to retrieving data from KEGG and need your help. I need to retrieve all of the small molecule compounds from all of the human metabolic pathways in the KEGG database. Essentially, I am attempting to capture the human metabolome in a spreadsheet. For each compound, I would like the associated information contained in the compound page (i.e. KEGG ID, formula, MW, pathway(s), CAS#, etc.). How would you go about retrieving this data? I understand that the KEGG API maybe able to help.


human kegg metabolome • 2.2k views
ADD COMMENTlink modified 3.2 years ago by Biostar ♦♦ 20 • written 4.1 years ago by Crabbit0

In principle, bulk data download from KEGG should be done via their FTP site, see here (FTP access requires a subscription to help fund KEGG). With the API, you'll need to first retrieve all pathway IDs then do one or more queries for each of them.

ADD REPLYlink written 4.1 years ago by Jean-Karim Heriche21k

Thanks for your suggestion. I used the KEGG API to pull-out all of the human pathway IDs associated to a human gene ( This comes out to 301 unique pathways. Is there a way to query all of the hsa pathways simultaneously to retrieve all the associated compound IDs?

ADD REPLYlink written 4.1 years ago by Crabbit0

I think, Jean-Karim Heriche's approach is one valid way to go. But keep in mind that KEGG pathways are a collection of reactions from different organisms. For example, consider the Lysine biosynthesis in which some reactions are associated with human enzymes. If you select all metabolites from this pathway, you will likely end up with a large collection of false positives. Notably, the compounds in the upper part of the pathway map are mostly created through planta and bacteria (have a look at the enzymes associated with the reactions if you like).

Alternatively, you can collect all human proteins (regardless of whether you downloaded from FTP or access via API) and select their enzymatic reactions. From the list of reactions you can get all associated metabolites. Unfortunately, this will likely leave you with a very sparse metabolome.

Probably, you can give some more information on your aim. What do you try to achieve?

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Manuel Landesfeind1.2k

I retrieved all of the KEGG compound IDs associated with human genes ( ). The list of compounds comes out to 1902. Does this link-search include reactants, products, co-factors, etc.?

To approximate the known human metabolome, I aim to retrieve all of the compounds (reactants, products, co-factors, etc.) from the KEGG human metabolic map. The more information I can retrieve about a given compound the better. We are in the process of "reconstructing" the human metabolome to perform a HTS. Can you recommend other databases that can provide such information? I retrieved a list from HMDB that came out to ~50000 compounds, but sorting those only involved in human metabolism is proving difficult. Once I have an approximate list of the "human metabolome" I will cross-reference this to see what is commercially available.

ADD REPLYlink modified 8 weeks ago by RamRS25k • written 4.1 years ago by Crabbit0

As mentioned above, there is no "human metabolic map" in KEGG. In fact, the KEGG Pathways are more like an Ontology, e.g., the Lysine biosynthesis contains many reactions (and therefore enzymes) involved in Lysine metabolism.

I guess that your query results in the same result I had obtained with multiple queries. However, I just wanted to make you aware that the selection of compounds might not be exact - regardless of the method you choose. Using Jean-Karim Heriche's approach you will clearly retrieve to many compounds while my approach is too conservative, e.g., it does not include spontaneous reactions in between two enzymatic reactions. And depending on your research targets you have to keep that in mind.

Thats why I was also asking for your overall goal... if you can elaborate on your research, somebody may also "recommend other databases that can provide such information".

For example, if you would want to analyse metabolite fingerprinting (mass spectrometry) you have to be aware that some of the compounds do not have correct monoisotopic masses or even more problematic are not complete. For example, fatty acids are frequently abbreviated in the sum formulas. And in case of FA analyses you might be better of using LipidMaps.

PS: What do you mean by "We are in the process of "reconstructing" the human metabolome to perform a HTS."? I know the term "HTS" as high throughput sequencing...

ADD REPLYlink modified 8 weeks ago by RamRS25k • written 4.1 years ago by Manuel Landesfeind1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1010 users visited in the last hour