Question: Kegg Data Download
3
gravatar for Siva Kumar
7.3 years ago by
Siva Kumar30
Siva Kumar30 wrote:

I was using KEGG API to download certain information related to enzymes and pathways. KEGG API expects certain inputs for some of the methods in the API. For example, here is a call get_enzymes_by_compound(string:compound_id). This method expects a compound_id. A sample compound id given in the KEGG API reference manual is cpd:C00345. But there is no specific function to get all the mappings of the compounds and corresponding internal ids. Similar is the case of glicanid, reactionid etc., Has any one used KEGG API to download this information and if so, how are the input parameters like Compundid, enzymeid, reaction_id etc., are taken from? Please help me in this regard. Thank you in advance.

pathway api kegg • 3.8k views
ADD COMMENTlink modified 7.2 years ago by Joachim2.8k • written 7.3 years ago by Siva Kumar30

I don't know if you have the same experience as I, but retrieving info (sequences especially) using the KEGG API is very low. If someone has access or has generated a mapping file, I would be very curious of it.

ADD REPLYlink written 7.3 years ago by Manu Prestat3.9k
1
gravatar for Joachim
7.3 years ago by
Joachim2.8k
San Francisco, California
Joachim2.8k wrote:

Hi!

If I understand you correctly, then you are having troubles with obtaining some of the parameters to KEGG API calls. In particular, you do not know how to retrieve all compound IDs in KEGG.

Here is how you get all compound IDs in human pathways (with BioRuby):

sudo gem install bio
sudo gem install soap4r-ruby1.9    # if you are using Ruby 1.9

Now lets write a little Ruby program, 'compounds.rb', that outputs the pathways and the compounds appearing in them:

#!/usr/bin/ruby

require 'bio'

serv = Bio::KEGG::API.new

pathways = serv.list_pathways('hsa')
pathways.each do |pathway|
    compounds = serv.get_compounds_by_pathway(pathway.entry_id)
    compounds.each do |compound|
        puts "#{pathway.entry_id}\t#{compound}"
    end
end

The output of the program, 'ruby compounds.rb', looks like this (tab-separated):

path:hsa00010   cpd:C00022
path:hsa00010   cpd:C00024
path:hsa00010   cpd:C00031
path:hsa00010   cpd:C00033
path:hsa00010   cpd:C00036

Now, you can modify and extend the program to get just a unique set of compounds for further processing. In case you just need a list of compound IDs, then you can simply run:

ruby compounds.rb | cut -f 2 | sort | uniq

Hope this helps,

Joachim

ADD COMMENTlink written 7.3 years ago by Joachim2.8k
1

That is a good point, Hamish. However, people need to be aware that bulk downloads of KEGG are not free anymore: http://www.bioinformatics.jp/docs/subscription_schedule.pdf

ADD REPLYlink written 7.3 years ago by Joachim2.8k
1

Yes KEGG FTP downloads are now subscription only (as noted on the page I linked along with details of why this option was chosen). However attempting to use the web services for bulk downloads can lead to you or your organisation being blacklisted by KEGG. So it is worthwhile considering a subscription if you need to do this. If you and your colleagues use KEGG a fair amount, it is possible that your organisation already has a subscription in place, and you just have to ask around to get hold of the data files.

ADD REPLYlink written 7.2 years ago by Hamish3.1k

Worth noting that if you need a large chunk of the data it can be more efficient to download the required data sets and perform the processing locally instead of using the web services. For KEGG details of how to download the data can be found at http://www.kegg.jp/kegg/download/.

ADD REPLYlink written 7.3 years ago by Hamish3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1203 users visited in the last hour