Gene Pathway Association File For Kegg
5
5
Entering edit mode
10.7 years ago
Sequencegeek ▴ 740

I am trying to manually calculate the pathway enrichment for some genes. I have downloaded the panther and reactome data files which have the pathway annotations for genes. However, I couldn't find similar files for the KEGG pathway database.

The one I downloaded from ftp://ftp.genome.jp/pub/kegg/pathway/pathway consists of only the descriptions for each pathway without the gene list. Could someone point me to the link of a gene pathway association file for KEGG?

I have also tried the KEGG python api, but I couldn't get a list returned with the following codes:

from SOAPpy import WSDL
wsdl = 'http://soap.genome.jp/KEGG.wsdl'
serv = WSDL.Proxy(wsdl)
serv.get_pathways_by_genes(['ENSG00000120328'])
serv.get_pathways_by_genes(['ENST00000361510'])
serv.get_pathways_by_genes(['OPA1'])


It did return a pathway list by using gene names like:

serv.get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])


But my gene names are either Ensembl gene id or official gene symbol. Is there a way do find the correspondence between this kind of eco gene and Ensembl gene?

Thanks

kegg pathway • 13k views
0
Entering edit mode

note: it seems you have posted two questions: how to parse KGML files and how to convert KEGG ids to Ensembl ids. It would be easier to answer you is you can split the questions.

0
Entering edit mode

note that KEGG's FTP now requires a commercial subscription to be accessed (http://www.genome.jp/kegg/docs/plea.html). Some of the answers in this thread may not be available without that subscription.

5
Entering edit mode
10.7 years ago

You could try the KEGG files at the PathVisio download page. You can export them (using PathVisio itself) as a gene list, which should allow you to do the enrichment calculations in whatever way you prefer. But you should in fact be able to use the PathVisio plugins to do the enrichment calculations. If you don't use that it is probably easier to follow one of the other more straightforward answers.

Freds answer explains how we got them there in the first place. The Kegg XML was not trivial to translate though, since they seem to not always follow their own documentation. So the translation from KGML (KEGG) to GPML could still be improved. But for enrichment analysis it should be good enough.

[Edited May 29, 2020 by Kevin Blighe: update links]

4
Entering edit mode
10.7 years ago

I recently played with KEGG API to retrieve pathways associated with genes using same method. I noticed that the method you get_pathways_by_genes is a tricky one. It takes a list of genes and retrieve list of common pathways associated with the input genes. That's why it worked for

serv.get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])


But not worked for the individual gene IDs you provided. For example your 3 IDs could map it to 2 KEGG genes "hsa:4976", "hsa:56124". These two genes don't have any associated pathways.

If you try with two genes (IL8 and IL5) which participate in a common pathway (Chemokine receptor defect),

serv.get_pathways_by_genes([hsa:3568", "hsa:3577"])


You will get the result path:hsa04060

If you need some mapping help with KEGG identifiers you can use KEGG mapper, other option is to try InterMine based TargetMine to calculate enrichment of your lists using GO, OMIM and KEGG and get interesting insights. Manuscript describing TargetMine is available here.

0
Entering edit mode

that is a great resource ! I'm tring to use the Perl API ...i installed the Webservice::InterMine, but I get this error when I execute a template script. Any idea?

<TITLE>Error</TITLE>
<BODY>
<H1>Error</H1>

0
Entering edit mode

Sorry to hear you had problems with the Perl client - feel free to send us more details of your issue (dev@intermine.org) or reply to this thread and we would love to help. Without seeing your script, it sounds like a url issue - if you update the client the recent version has a fix for private ip addresses, so that may solve things. Alex

0
Entering edit mode

Hi khader Shameer, iam using the function serv.getpathwaysbygenes(genes) to get the pathway and when as a output iam getting the pathway number only if need the name is there function or do you know how to do it ?. for example if the give like this getpathwaysbygenes(hsa:1431) and iam getting hsa:00020 and another 2 but i want the name like this hsa:00020 Citrate cycle (TCA cycle)

0
Entering edit mode

AFAIK the API returns only pathway ID not a name (See: http://www.kegg.jp/kegg/soap/doc/keggapi_manual.html#label:95). You can use a local tab-delimited file of KEGG IDs and Pathways and parse it. Unfortunately I cannot point you to a FTP site due to licensing restrictions on KEGG (See: http://www.kegg.jp/kegg/download/)

3
Entering edit mode
10.7 years ago
Joachim ★ 2.9k

Have a look at ftp://ftp.genome.jp/pub/kegg/genes/organisms/ and the respective sub-directories of the species you are interested in. For example, in /pub/kegg/genes/organisms/hsa you will find H.sapiens.ent, which is probably the kind of file you are looking for.

0
Entering edit mode

This is how I always get the pathway genes too. All of the APIs are annoying.

1
Entering edit mode
10.7 years ago

[?]

[?]

ftp://ftp.genome.jp/pub/kegg/pathway/organisms/hsa/ [?]Once you extracted all the Entrez Gene Id for a given pathway you can get the corresponding Ensembl Id or Hugo Gene Name by querying UCSC or BIOMART databases.[?] [?]By the way there are a lot of stuff in the KEGG Download section. Feel free to explore it and may be you will find a nicer file format that will fit your needs.[?] http://www.genome.jp/kegg/download/

1
Entering edit mode
10.7 years ago

You should be able to parse the KGML files in R with the KEGGgraph R library. Once you have done that, you can use the biomaRt bioconductor library to get the to HGNC or Ensembl IDs.

Alternatively, you can read the KGML file in Cytoscape with the KGML parser plugin, and then use other Cytoscape plugin to get the gene IDs.

If you absolutely need to do this with python, I am afraid I don't know any library to interrogate Biomart with python. however, you can always download the whole list of Kegg Gene ID and Ensembl id as a tabular file, and get the correspondences from there; this will also eliminate the need for being connected to Internet.