Question: How To Download All Est Sequences For Organism Xx From Ncbi?
8
gravatar for Yannick Wurm
8.6 years ago by
Yannick Wurm2.3k
Queen Mary University London
Yannick Wurm2.3k wrote:

I want to download all EST sequences from Genbank that are in the Order Hymenoptera.

This is easy by pointing and clicking: http://www.ncbi.nlm.nih.gov/sites/entrez?term=Hymenoptera&cmd=Search&db=nucest

However, I cannot get the efetch/eutils syntax right to do this in the commandline.

Note that contrarily to the following question, I do not know the identifiers for these ESTs: http://biostar.stackexchange.com/questions/375/how-to-retrive-the-dna-sequence-from-a-list-of-embl-and-geneid

Thanks!

yannick

sequence eutils genbank retrieval • 7.5k views
ADD COMMENTlink modified 7.1 years ago • written 8.6 years ago by Yannick Wurm2.3k
5
gravatar for Pierre Lindenbaum
8.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:

try db=nucest, retmax=300000 and term=Hymenoptera[Organism]

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucest&retmax=300000&term=Hymenoptera[Organism]

and then call efetch with the list of ID, e.g:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=281381068&rettype=fasta

ADD COMMENTlink modified 8.6 years ago • written 8.6 years ago by Pierre Lindenbaum113k
1

or you can put more than one id with the same parameter 'id': curl 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=281381068,281381069,281381067&rettype=fasta'

ADD REPLYlink written 8.6 years ago by Pierre Lindenbaum113k

Merci, Pierre! Your response implies that one cannot avoid calling efetch 300,000 times? I'm a bit scared NCBI will think I'm running a denial-of-service attack!!

Also, is it possible to get eutils output in .txt rather than xml?

ADD REPLYlink written 8.6 years ago by Yannick Wurm2.3k

no, you can also use the parameter 'usehistory' in esearch (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#History), this will give your a 'WebEnv' that you'll later use with one and only efetch query

ADD REPLYlink written 8.6 years ago by Pierre Lindenbaum113k

excellent, thanks!

ADD REPLYlink written 8.6 years ago by Yannick Wurm2.3k

And bioruby provides a great wrapper. http://bioruby.org/rdoc/classes/Bio/NCBI/REST.html

ADD REPLYlink written 8.6 years ago by Yannick Wurm2.3k
5
gravatar for Darked89
8.6 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

You can start with TaxonomyBrowser:

ADD COMMENTlink written 8.6 years ago by Darked894.2k

thanks - but I was looking for "non-clicking" approach to this (see Pierre's answer)

ADD REPLYlink written 8.6 years ago by Yannick Wurm2.3k
4
gravatar for Yannick Wurm
8.6 years ago by
Yannick Wurm2.3k
Queen Mary University London
Yannick Wurm2.3k wrote:

Following up on Pierre's answer & my comment, here's how to do it in ruby, using NCBI's eutils.

#!/sw/bin/ruby
require 'bio'


ncbi = Bio::NCBI::REST.new
Bio::NCBI.default_email = "your@email.address"
sequenceIDs = ncbi.esearch("Hymenoptera[organism]",
                           { "db"=>"protein", "rettype"=>"gb", "retmax"=> 10000000})    
sequences   = ncbi.efetch(ids = sequenceIDs,
                          {"db"=>"protein", "rettype"=>"fasta", "retmax"=> 10000000})

# ncbi returns a single big string with records separated by two newlines              
sequences.gsub!("\n\n", "\n")
File.open('genbankHymenopteranProts.fasta', 'w') {|f| f.write(sequences +"\n") }
ADD COMMENTlink modified 5 weeks ago by RamRS18k • written 8.6 years ago by Yannick Wurm2.3k

Error:

/usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:151:in `ncbi_check_parameters': Set email parameter for the query, or set Bio::NCBI.default_email = "(your email address)" (RuntimeError)
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:128:in `ncbi_post_form'
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:268:in `esearch'
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:265:in `step'
        from /usr/lib/ruby/gems/1.8/gems/bio-1.4.3.0001/lib/bio/io/ncbirest.rb:265:in `esearch'
        from eftech.ry:7
ADD REPLYlink modified 5 weeks ago by RamRS18k • written 5.2 years ago by Rm7.8k

Indeed you need to supply your email (I didn't want to put my own; added a blank one above):

Bio::NCBI.default_email = "your email"
ADD REPLYlink modified 5 weeks ago by RamRS18k • written 4.4 years ago by Yannick Wurm2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2027 users visited in the last hour