Question: Gene Pathway Association File For Kegg
gravatar for Sequencegeek
8.1 years ago by
Sequencegeek740 wrote:

I am trying to manually calculate the pathway enrichment for some genes. I have downloaded the panther and reactome data files which have the pathway annotations for genes. However, I couldn't find similar files for the KEGG pathway database.

The one I downloaded from consists of only the descriptions for each pathway without the gene list. Could someone point me to the link of a gene pathway association file for KEGG?

I have also tried the KEGG python api, but I couldn't get a list returned with the following codes:

from SOAPpy import WSDL
wsdl = ''  
serv = WSDL.Proxy(wsdl)  

It did return a pathway list by using gene names like:

serv.get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])

But my gene names are either Ensembl gene id or official gene symbol. Is there a way do find the correspondence between this kind of eco gene and Ensembl gene?


pathway kegg • 11k views
ADD COMMENTlink modified 5.6 years ago by Biostar ♦♦ 20 • written 8.1 years ago by Sequencegeek740

note: it seems you have posted two questions: how to parse KGML files and how to convert KEGG ids to Ensembl ids. It would be easier to answer you is you can split the questions.

ADD REPLYlink written 8.1 years ago by Giovanni M Dall'Olio26k

note that KEGG's FTP now requires a commercial subscription to be accessed ( Some of the answers in this thread may not be available without that subscription.

ADD REPLYlink written 7.8 years ago by Giovanni M Dall'Olio26k
gravatar for Chris Evelo
8.1 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

You could try the KEGG files at the PathVisio download page. You can export them (using PathVisio itself) as a gene list, which should allow you to do the enrichment calculations in whatever way you prefer. But you should in fact be able to use the PathVisio plugins to do the enrichment calculations. If you don't use that it is probably easier to follow one of the other more straightforward answers.

Freds answer explains how we got them there in the first place. The Kegg XML was not trivial to translate though, since they seem to not always follow their own documentation. So the translation from KGML (KEGG) to GPML could still be improved. But for enrichment analysis it should be good enough.

ADD COMMENTlink modified 5.4 years ago by Egon Willighagen5.2k • written 8.1 years ago by Chris Evelo10.0k
gravatar for Khader Shameer
8.1 years ago by
Manhattan, NY
Khader Shameer18k wrote:

I recently played with KEGG API to retrieve pathways associated with genes using same method. I noticed that the method you get_pathways_by_genes is a tricky one. It takes a list of genes and retrieve list of common pathways associated with the input genes. That's why it worked for

serv.get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])

But not worked for the individual gene IDs you provided. For example your 3 IDs could map it to 2 KEGG genes "hsa:4976", "hsa:56124". These two genes don't have any associated pathways.

If you try with two genes (IL8 and IL5) which participate in a common pathway (Chemokine receptor defect),

serv.get_pathways_by_genes([hsa:3568", "hsa:3577"])

you will get the result path:hsa04060

If you need some mapping help with KEGG identifiers you can use KEGG mapper, other option is to try InterMine based TargetMine to calculate enrichment of your lists using GO, OMIM and KEGG and get interesting insights. Manuscript describing TargetMine is available here.

ADD COMMENTlink written 8.1 years ago by Khader Shameer18k

that is a great ressource ! I'm tring to use the Perl APi ...i installed the Webservice::InterMine , but i get this error when i excecute a template script .any idea ? <TITLE>Error</TITLE> <BODY> <H1>Error</H1> FW-1 at tornado: Access denied.</BODY>

ADD REPLYlink written 7.4 years ago by Abdel150

Sorry to hear you had problems with the Perl client - feel free to send us more details of your issue ( or reply to this thread and we would love to help. Without seeing your script, it sounds like a url issue - if you update the client the recent version has a fix for private ip addresses, so that may solve things. Alex

ADD REPLYlink written 7.4 years ago by Alex Kalderimis0

Hi khader Shameer, iam using the function serv.getpathwaysbygenes(genes) to get the pathway and when as a output iam getting the pathway number only if need the name is there function or do you know how to do it ?. for example if the give like this getpathwaysbygenes(hsa:1431) and iam getting hsa:00020 and another 2 but i want the name like this hsa:00020 Citrate cycle (TCA cycle)

ADD REPLYlink written 6.9 years ago by dinesh.prabakaran0

AFAIK the API returns only pathway ID not a name (See: You can use a local tab-delimited file of KEGG IDs and Pathways and parse it. Unfortunately I cannot point you to a FTP site due to licensing restrictions on KEGG (See:

ADD REPLYlink written 6.9 years ago by Khader Shameer18k
gravatar for Joachim
8.1 years ago by
San Francisco, California
Joachim2.8k wrote:

Have a look at and the respective sub-directories of the species you are interested in. For example, in /pub/kegg/genes/organisms/hsa you will find H.sapiens.ent, which is probably the kind of file you are looking for.

ADD COMMENTlink written 8.1 years ago by Joachim2.8k

This is how I always get the pathway genes too. All of the APIs are annoying.

ADD REPLYlink written 8.1 years ago by Will4.5k
gravatar for Fred Fleche
8.1 years ago by
Fred Fleche4.3k
Paris, France
Fred Fleche4.3k wrote:


[?] [?]Once you extracted all the Entrez Gene Id for a given pathway you can get the corresponding Ensembl Id or Hugo Gene Name by querying UCSC or BIOMART databases.[?] [?]By the way there are a lot of stuff in the KEGG Download section. Feel free to explore it and may be you will find a nicer file format that will fit your needs.[?]

ADD COMMENTlink written 8.1 years ago by Fred Fleche4.3k
gravatar for Giovanni M Dall'Olio
8.1 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

You should be able to parse the KGML files in R with the KEGGgraph R library. Once you have done that, you can use the biomaRt bioconductor library to get the to HGNC or Ensembl IDs.

Alternatively, you can read the KGML file in Cytoscape with the KGML parser plugin, and then use other Cytoscape plugin to get the gene IDs.

If you absolutely need to do this with python, I am afraid I don't know any library to interrogate Biomart with python. however, you can always download the whole list of Kegg Gene ID and Ensembl id as a tabular file, and get the correspondences from there; this will also eliminate the need for being connected to Internet.

ADD COMMENTlink modified 8.1 years ago • written 8.1 years ago by Giovanni M Dall'Olio26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1392 users visited in the last hour