Question: HTTP Error 502-Biopython-Entrez Files
1
gravatar for jasminebro2
4.8 years ago by
United States
jasminebro20 wrote:

Hi I am using biopython to pull files from NCBI using Entrez. The program works on small files but on larger files I get an error. I would really appreciate some insight or help figuring out what went wrong.

Here is the program:

from Bio import Entrez
Entrez.email = "jbro262@lsu.edu"
search_handle = Entrez.esearch(db="nucleotide",term="Saimiri",usehistory="n")
search_results = Entrez.read(search_handle)
search_handle.close()

gi_list = search_results["IdList"]
count = int(search_results["Count"])
a = open("Numfile.txt", "a+")
a.write("The number of Saimiri files are :")
a.write(str(count))
a.write("\n")
a.close()

webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]

batch_size = 25
out_handle = open("SaimiriDNA.fasta", "w")

for start in range(0,count,batch_size):
    
    end = min(count, start+batch_size)
    print("Going to download record %i to %i" % (start+1, end))
    
    fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", retstart=start, retmax=batch_size, webenv=webenv, query_key=query_key)
    data=fetch_handle.read()
    fetch_handle.close()
    out_handle.write(data)
out_handle.close()

HERE ARE THE ERRORS:

Traceback (most recent call last):
  File "Entrezfiles_Saimiri.py", line 53, in <module>
    fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", retstart=start, retmax=batch_size, webenv=webenv, query_key=query_key)
  File "/usr/local/lib/python3.4/

dist-packages/Bio/Entrez/__init__.py", line 149, in efetch
    return _open(cgi, variables, post)
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py", line 464, in _open
    raise exception
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py", line 462, in _open
    handle = _urlopen(cgi)
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 461, in open
    response = meth(req, response)
  File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.4/urllib/request.py", line 499, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 502: Bad Gateway

Does this mean something is going wrong with my server while the files are downloading?

Help is greatly appreciated.

ADD COMMENTlink modified 4.8 years ago by Peter5.8k • written 4.8 years ago by jasminebro20
1

502 has usually nothing to do with the client. Could you try again in a half hour or so and see if it still exists?

ADD REPLYlink written 4.8 years ago by RamRS24k

Thank you. Okay I will try again in a few minutes. However, It took an hour or so for the error to occur the last time. What exactly does Error 502 mean and how does that relate to a urllib.error with python?

ADD REPLYlink written 4.8 years ago by jasminebro20
1

Under Python 3, you would import the HTTPError class with: from urllib.error import HTTPError

Having done that you can use it to catch the exception, see also: http://stackoverflow.com/questions/3193060/catch-specific-http-error-in-python

HTTP error code 502 is a specific server problem (in this case, an NCBI problem). See http://en.wikipedia.org/wiki/List_of_HTTP_status_codes

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Peter5.8k

Thanks! I'll use this information to help edit my code.

ADD REPLYlink written 4.8 years ago by jasminebro20

Hey. The try/except around Entrez.fetch fixed my program. Works great now. Thanks!

ADD REPLYlink written 4.8 years ago by jasminebro20
1
gravatar for Peter
4.8 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

When making heavy use of an online service like NCBI Entrez, you should expect to get intermittent network errors like HTTP Error 502: Bad Gateway from time to time. The standard approach would be to wrap the call in a try/except block and retry it (e.g. three retries, with a pause between each).

Or just wait and retry when the NCBI is less busy (i.e. avoid USA working hours), that is often easier ;)

See also Ncbi Entrez Server Issues

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Peter5.8k

Thanks Peter! That makes a lot of sense. Would you mind giving me an example of what you mean by wrapping the call in a try/except block. I am fairly new to programming so I'm learning as I go. Do you mean for each batch that's called I have a block of code that tries to pull the data except when an error occurs and the program moves on from there?

I'll take a look at this NCBI Server issue link also.

 

ADD REPLYlink written 4.8 years ago by jasminebro20
1

I don't have an example to hand, another Biopython contributor might: http://lists.open-bio.org/pipermail/biopython-dev/2014-November/020773.html

I would in the first instance put the try/except round the Entrez.efetch(...) call to allow a pause and retry - but that would only work as long as the history session does not expire.

ADD REPLYlink written 4.8 years ago by Peter5.8k

Okay I will look into other examples. Thanks so much.

As far a using the history option, when I tried to use the history option all of the files that I see online would not download it would only use a portion of them. So I opted to put no instead of yes for "use history" Could that be an issue?

ADD REPLYlink written 4.8 years ago by jasminebro20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1955 users visited in the last hour