Retrieve Fasta Sequences From Kegg By Keyword
1
0
Entering edit mode
11.5 years ago
marcelolaia ▴ 10

Hi, I would like to do a search in the KEGG database 'xac' (organism t00084) with 'keyword' (hypothetical) and retrieve all fasta printed out.

Here is a search example

I need a tab delimited text to do a downstream analysis. Thanks!

kegg fasta dna protein search • 4.1k views
ADD COMMENT
1
Entering edit mode

What do you mean by "retrieve all fasta printed out"? Fasta is a sequence format. And then you say "I need a tab delimited text." Please give an example of the output that you want.

ADD REPLY
0
Entering edit mode

sory! I need all fasta in a text flat file.

ADD REPLY
1
Entering edit mode

(1) see http://www.genome.jp/kegg/catalog/org_list.html (2) download all sequences from Xanthomonas axonopodis and then (3) use your favorite programming language to retrieve all sequences annotated as hypothetical.

ADD REPLY
0
Entering edit mode

Which organism? That link lists all organisms.

ADD REPLY
1
Entering edit mode

xac Xanthomonas axonopodis pv. citri 306

ADD REPLY
1
Entering edit mode
11.5 years ago
Neilfws 49k

I find interacting with KEGG using dbget via the Web extremely painful. So I'd go for a different approach.

Approach 1

Based on Is There Any Way To Retrieve Genes' Sequences In Fasta Format Using The Kegg Orthology Code? to a previous question, you could use the BioRuby Bio::KEGG::API to search and retrieve something like this:

#/usr/bin/ruby
require 'rubygems'
require 'bio'

serv = Bio::KEGG::API.new

# search for xac + hypothetical
xac = serv.bfind("T00084 hypothetical")
# get the IDS into an array
ids = xac.map { |gene| $1 if gene =~/^(.*?)\s+/ }
# retrieve fasta and print
ids.each { |id| puts serv.bget("-f -n 1 #{id}") }

This retrieves protein sequences; you'd need to adjust the parameters to bget for other options.

Approach 2

Download the fasta files from the NCBI (e.g. the *.faa files for protein sequence) and parse the header for the word "hypothetical" using one of the many tools available to parse fasta files.

ADD COMMENT
0
Entering edit mode

Thank you very much! I love your approach 1. It give me the chance to get more knowledge. I am a biologist. However, it printed out an error:

> $ get_fasta4  
/usr/lib/ruby/vendor_ruby/bio/io/soapwsdl.rb:63:in `create_driver': uninitialized constant Bio::SOAPWSDL::SOAP (NameError)
from /usr/lib/ruby/vendor_ruby/bio/io/keggapi.rb:201:in `initialize'  
from /home/marcelo/bin/scripts/get_fasta4:5:in `new'  
from /home/marcelo/bin/scripts/get_fasta4:5:in `<main>'
ADD REPLY
0
Entering edit mode

I'm impressed that you tried this solution. I do not see that error, I'm using ruby 1.8.7. Perhaps you are using ruby 1.9? Try "ruby -v" to find out. In which case, you may need to "gem install soap4r-ruby1.9".

ADD REPLY
0
Entering edit mode

ruby 1.9.3p194 (2012-04-20 revision 35410) [i486-linux]

ADD REPLY
0
Entering edit mode
# gem install soap4r-ruby1.9
Fetching: soap4r-ruby1.9-2.0.5.gem (100%)
Successfully installed soap4r-ruby1.9-2.0.5
1 gem installed
Installing ri documentation for soap4r-ruby1.9-2.0.5...
Installing RDoc documentation for soap4r-ruby1.9-2.0.5...

$get_fasta4
/usr/lib/ruby/1.9.1/rubygems/custom\_require.rb:36:in \`require': iconv will be deprecated in the future, use String#encode instead.
/home/marcelo/bin/scripts/get\_fasta4:10:in `<main>': undefined method `map' for #<String:0x99bd1fc> (NoMethodError)
ADD REPLY
0
Entering edit mode

OK, installation was successful but for some reason, map not working as expected. I'm afraid that as I do not use ruby 1.9, I don't have time to troubleshoot this. My best suggestion is to use 1.8.7 if possible (perhaps under RVM - https://rvm.io/) since I know the code works in that case.

ADD REPLY

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6