Question: Vector base gene id to NCBI ID
0
gravatar for Eai
11 months ago by
Eai0
Eai0 wrote:

I've got a large number of gene IDs from Vector Base (ex: AAEL006343-PA and AAEL001710-PA). Some of these IDs have record in NCBI Protein database.

I'm trying to use Biopython to get gene description and other info from NCBI using the following code (for simplicity I've put one id, but would normally do it as a list)

from Bio import Entrez

Entrez.email=emailhere

handle=Entrez.efetch(db="protein", id='AAEL006343-PA', rettype="gb", retmode="text")

records = Entrez.read(handle)

efetch fails with due to HTTP error Bad Request. I know that data does exist because using id 108877864 i get the result I want. However, 108877864 is the NCBI's own ID for this protein. The only way I found to convert AAEL006343-PA to 108877864 is via esearch, but I don't want to spam NCBI with hundreds of esearch queries.

Is there a way to do this ID conversion as a batch and without esearch?

software error • 365 views
ADD COMMENTlink written 11 months ago by Eai0

You would not spam NCBI as long as you sign up for NCBI_API_KEY and build in an appropriate delay in your queries.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax84k

I can do that and loop over 1000+ search calls, but surely there is a better and also quicker way to do this?

ADD REPLYlink written 11 months ago by Eai0

Perhaps you could download one of the annotation files from Vector Base and grep the info you need from it?

ADD REPLYlink written 11 months ago by genomax84k

The source of IDs I'm using are from Vector Base basefeatures GFF. I've had another look on their website, but I can't find a file that would provide actual description of the gene apart from GO for some features.

ADD REPLYlink written 11 months ago by Eai0

You can use a combination of esearch and efetch using the "history server", or this documentation from esearch and efetch - in perl with WebEnv=<webenv string>&usehistory=y so the python equivalent must be similar. I remember, esearch returns the WebEnv string you then need to use in efetch

Edit - Genomax solution seems much simpler for a one off execution

ADD REPLYlink modified 11 months ago • written 11 months ago by Carambakaracho2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1946 users visited in the last hour