Efetch And Biopython
1
3
Entering edit mode
9.9 years ago
Sabrewolfy ▴ 80

I'm using BioPython 1.53 with Python 2.6. The following code was working until the recent EFetch updates:

handle = Entrez.efetch(db="nucleotide", rettype="gb", id=seq)


where 'seq' is simply a string with an accession number. Now, however, nothing is being returned. I'm not sure what I need to fix as none of the changes seem to affect what I've coded above.

biopython ncbi python • 5.9k views
3
Entering edit mode

Thanks for the report. This is due to some changes at NCBI and is fixed in the current codebase: http://lists.open-bio.org/pipermail/biopython/2012-February/007743.html There will be a new release in the next week or so with these changes included.

1
Entering edit mode

The NCBI changes report that "EFetch URLs with multiple IDs must be entered as: id=1,2,3" and "EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3". However, if only one sequence is requested, the URL would be the same ... it would end with id=1.

0
Entering edit mode

Can you please open a bug report on biopython's bug tracker? https://redmine.open-bio.org/projects/biopython/issues?set_filter=1&tracker_id=1

0
Entering edit mode

@Brad Chapman: Thanks for the link. Didn't come acros that during my searching.

0
Entering edit mode

I'm still not clear on why this is not working for me. In the example, 'seq' contains just a string for one sequence -- I am not passing in a list for multiple sequences. However, it still does not work. I've tried the various solutions mentioned in the link above (nuccore/fasta and the join solution to create a comma-separated list), but neither work. My understanding is that the URL change should have no effect if only ONE sequence is requested.

0
Entering edit mode

As per Brad's full answer below, the problem is the NCBI changed the default retmode.

5
Entering edit mode
9.9 years ago

In addition to the multiple ID change behavior, NCBI also changed some database names and the default return modes. Here is a working query with Biopython 1.58:

from Bio import Entrez
import urllib2

Entrez.email="test@test.com"
handle = Entrez.efetch(db="nuccore", rettype="gb", retmode="text", id=76096369)

try:
handle = Entrez.efetch(db="nuccore", rettype="gb", retmode="text",
id='wrong')
except urllib2.HTTPError:
except IOError:
print "Problem connecting to NCBI"


Should give:

LOCUS       NM_007726               5807 bp    mRNA    linear ROD 19-FEB-2012

0
Entering edit mode

Thanks. I have tried changing the db and rettype details, but it still does not work with BioPython 1.53. I will try with your specific example.

0
Entering edit mode

Tested again with "nuccore" but it does not work with BioPython 1.53. Some change NCBI has made has broken this in 1.53 completely. None of the work-arounds have solved it.

0
Entering edit mode

1.53 is quite old now so there may have been bug fixes over the past several releases. Could you upgrade to 1.58 and retry? If that doesn't work, knowing the ID that is failing for you could help us reproduce the problem.

0
Entering edit mode

@Brad Chapman: Thanks, I have manually upgraded to 1.59 and adjusted my code as per your example above. I notice the handle object no longer has peekline which I was using, so I'll have to fix that, but I think the fetching problem is solved now.

0
Entering edit mode

I was using the length of peekline to determine if a sequence had in fact been returned or not (to check if an invalid accession number had been provided, for example). I'm fetching only one sequence.

0
Entering edit mode

Although I see now that Entrez.efetch raises an HTTPError if an invalid accession number if given.

0
Entering edit mode

But there is no HTTPError to catch in a try clause.

0
Entering edit mode

Yes, Peter has put in a lot of work to make the error handling more transparent so you don't have to manually check for problems. I added example code for catching and identifying bad records and network errors. Hope this helps.

0
Entering edit mode

Thanks for the example. I tried the urllib2.HTTPError, but forgot to import urllib2 :)