How To Download All Est Sequences For Organism Xx From Ncbi?
3
8
Entering edit mode
14.2 years ago
Yannick Wurm ★ 2.5k

I want to download all EST sequences from Genbank that are in the Order Hymenoptera.

This is easy by pointing and clicking: http://www.ncbi.nlm.nih.gov/sites/entrez?term=Hymenoptera&cmd=Search&db=nucest

However, I cannot get the efetch/eutils syntax right to do this in the commandline.

Note that contrarily to the following question, I do not know the identifiers for these ESTs: How To Retrive The Dna Sequence From A List Of Embl And Geneid

Thanks!

yannick

eutils genbank • 10k views
ADD COMMENT
10
Entering edit mode
ADD COMMENT
1
Entering edit mode

or you can put more than one id with the same parameter 'id':

curl 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=281381068<,281381069,281381067&rettype=fasta'
ADD REPLY
0
Entering edit mode

Merci, Pierre! Your response implies that one cannot avoid calling efetch 300,000 times? I'm a bit scared NCBI will think I'm running a denial-of-service attack!!

Also, is it possible to get eutils output in .txt rather than xml?

ADD REPLY
0
Entering edit mode

no, you can also use the parameter 'usehistory' in esearch (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#History), this will give your a 'WebEnv' that you'll later use with one and only efetch query

ADD REPLY
0
Entering edit mode

excellent, thanks!

ADD REPLY
0
Entering edit mode

And bioruby provides a great wrapper. http://bioruby.org/rdoc/classes/Bio/NCBI/REST.html

ADD REPLY
5
Entering edit mode
14.2 years ago
Darked89 4.6k

You can start with TaxonomyBrowser:

ADD COMMENT
0
Entering edit mode

thanks - but I was looking for "non-clicking" approach to this (see Pierre's answer)

ADD REPLY
4
Entering edit mode
14.2 years ago
Yannick Wurm ★ 2.5k

Following up on Pierre's answer & my comment, here's how to do it in ruby, using NCBI's eutils.

#!/sw/bin/ruby
require 'bio'


ncbi = Bio::NCBI::REST.new
Bio::NCBI.default_email = "your@email.address"
sequenceIDs = ncbi.esearch("Hymenoptera[organism]",
                           { "db"=>"protein", "rettype"=>"gb", "retmax"=> 10000000})    
sequences   = ncbi.efetch(ids = sequenceIDs,
                          {"db"=>"protein", "rettype"=>"fasta", "retmax"=> 10000000})

# ncbi returns a single big string with records separated by two newlines              
sequences.gsub!("\n\n", "\n")
File.open('genbankHymenopteranProts.fasta', 'w') {|f| f.write(sequences +"\n") }
ADD COMMENT
0
Entering edit mode

Error:

/usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:151:in `ncbi_check_parameters': Set email parameter for the query, or set Bio::NCBI.default_email = "(your email address)" (RuntimeError)
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:128:in `ncbi_post_form'
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:268:in `esearch'
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:265:in `step'
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:265:in `esearch'
        from eftech.ry:7
ADD REPLY
0
Entering edit mode

Indeed you need to supply your email (I didn't want to put my own; added a blank one above):

Bio::NCBI.default_email = "your email"
ADD REPLY

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6