Question

Vector base gene id to NCBI ID

0

Entering edit mode

6.1 years ago

BioPerson • 0

I've got a large number of gene IDs from Vector Base (ex: AAEL006343-PA and AAEL001710-PA). Some of these IDs have record in NCBI Protein database.

I'm trying to use Biopython to get gene description and other info from NCBI using the following code (for simplicity I've put one id, but would normally do it as a list)

from Bio import Entrez

Entrez.email=emailhere

handle=Entrez.efetch(db="protein", id='AAEL006343-PA', rettype="gb", retmode="text")

records = Entrez.read(handle)

efetch fails with due to HTTP error Bad Request. I know that data does exist because using id 108877864 i get the result I want. However, 108877864 is the NCBI's own ID for this protein. The only way I found to convert AAEL006343-PA to 108877864 is via esearch, but I don't want to spam NCBI with hundreds of esearch queries.

Is there a way to do this ID conversion as a batch and without esearch?

software error • 1.9k views

ADD COMMENT • link 6.1 years ago by BioPerson • 0

0

Entering edit mode

You would not spam NCBI as long as you sign up for NCBI_API_KEY and build in an appropriate delay in your queries.

ADD REPLY • link 6.1 years ago by GenoMax 152k

0

Entering edit mode

I can do that and loop over 1000+ search calls, but surely there is a better and also quicker way to do this?

ADD REPLY • link 6.1 years ago by BioPerson • 0

0

Entering edit mode

Perhaps you could download one of the annotation files from Vector Base and grep the info you need from it?

ADD REPLY • link 6.1 years ago by GenoMax 152k

0

Entering edit mode

The source of IDs I'm using are from Vector Base basefeatures GFF. I've had another look on their website, but I can't find a file that would provide actual description of the gene apart from GO for some features.

ADD REPLY • link 6.1 years ago by BioPerson • 0

0

Entering edit mode

You can use a combination of esearch and efetch using the "history server", or this documentation from esearch and efetch - in perl with WebEnv=<webenv string>&usehistory=y so the python equivalent must be similar. I remember, esearch returns the WebEnv string you then need to use in efetch

Edit - Genomax solution seems much simpler for a one off execution

ADD REPLY • link 6.1 years ago by Carambakaracho ★ 3.3k