Question: How Can I Efetch A Genpept Record To Get An Entrez Id Using An Sgd Identifier?
3
gravatar for Mohawkjohn
8.2 years ago by
Mohawkjohn30
Austin, Texas, US
Mohawkjohn30 wrote:

When I do a search on NCBI's website for S000000001, I get two records.

When I try to efetch it in the protein database, I get no records. Apparently, this is not a primary ID.

Is there a way to get the records via BioRuby using the yeast identifier?

I do realize BioRuby/efetch is not ideal for this task. Unfortunately, this yeast identifier is not in Ensembl BioMart. It also appears to lack any Entrez ID or GenPept ID in SGDfeatures.tab (which is the same table given when I do a batch download). There's another file that gives an acceptable protein ID, but it's not an NP accession, so there is no dbxref for GeneID if I efetch it. So how do I convert, within a pipeline, from SGD ID to Entrez ID?

ncbi eutils • 2.9k views
ADD COMMENTlink modified 7.9 years ago by Neilfws48k • written 8.2 years ago by Mohawkjohn30
3
gravatar for Neilfws
8.2 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

It is possible, but not ideal, to use NCBI Entrez and BioRuby for this purpose. As you note, S000000001 is not a primary ID. It's found in the db_xref section of the record (see this example), which cannot be used as a search term, so you would just have to run a "general" search, without term qualifiers.

Sometimes it is better to use an organism-specific resource. The Saccharomyces Genome Database has a batch download page. You can upload a file of identifiers (including SGD IDs) and retrieve a variety of records, such as protein sequence in FASTA format. You could also retrieve other identifiers if, for example, you need to go back to NCBI for Genpept records.

ADD COMMENTlink written 8.2 years ago by Neilfws48k

I edited the question. How would a "general" search be run in BioRuby? I can't really find any documentation.

ADD REPLYlink written 8.2 years ago by Mohawkjohn30

See answer below.

ADD REPLYlink written 8.2 years ago by Neilfws48k
2
gravatar for Neilfws
8.2 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

OK, here is how you would run esearch and efetch using BioRuby:

#!/usr/bin/ruby

require "rubygems"
require "bio"

Bio::NCBI.default_email = "me@me.com"
sgd    = "S000000001"
ncbi   = Bio::NCBI::REST.new
search = ncbi.esearch(sgd, {"db" => "protein"})

search.each do |result|
  record = ncbi.efetch(result, { "db" => "protein", "rettype" => "gp" })
  File.open("#{result}.gp", 'w') {|f| f.write(record) }
end

First, you run esearch using the search term (e.g. S000000001). This returns an array (search), which just contains UIDs, if results were returned. Then you loop through the array, pass the UIDs to efetch and specify that the format (rettype) is Genpept. Finally, the code opens a file named after the UID and writes out the Genpept record.

Hopefully that's enough to get you started. I agree the Bio::NCBI::REST documentation is not great; in particular, I always forget to specify a default email. Have a look at the API documentation for Bio::NCBI::REST::ESearch::Methods and Bio::NCBI::REST::EFetch::Methods.

ADD COMMENTlink written 8.2 years ago by Neilfws48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1286 users visited in the last hour