Biopython - AttributeError: when downloading from genbank
2
0
Entering edit mode
3.5 years ago
Wilber0x ▴ 50

I have a list of around 200 plastid genomes which I want to download from genbank using biopython, and then put into one .gb file.

Here is the code I am using to do this:

out_handle = open(filename, "w")
for i in range(len(genbankIDs)):
    net_handle = Entrez.efetch(
        db="nucleotide", id=genbankIDs[i], rettype="gbwithparts", retmode="text"
    )
    out_handle.write(net_handle.read())
out_handle.close()
net_handle.close()
print("Saved")

where genbankIDs is the list of accession numbers of the sequences I want to download from genbank.

However, this only works for the first 20 accessions. I get this error message:

Traceback (most recent call last):
  File "fetchFromGenbank.py", line 25, in <module>
    db="nucleotide", id=genbankIDs[i], rettype="gbwithparts", retmode="text"
  File "/opt/anaconda2/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 195, in efetch
    return _open(cgi, variables, post=post)
  File "/opt/anaconda2/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 564, in _open
    and exception.status // 100 == 4:
AttributeError: 'HTTPError' object has no attribute 'status'

How can I solve this? Is it an instance of NCBI timing out, or a problem based on me using python 2.7 and not a more recent version?

biopython software error • 941 views
ADD COMMENT
1
Entering edit mode

Wilber0x : Biostars built-in SPAM protection (we need to have this in place sorry) does not allow HTTP links in title (I think it was interpreting HTTPerror in your post title as a link). I have edited that out so hopefully you post will not be automatically flagged/deleted as SPAM now. I also reinstated your account so you should be able to respond.

ADD REPLY
0
Entering edit mode

Thank you for your help

ADD REPLY
2
Entering edit mode
3.5 years ago
GenoMax 141k

Have you signed up for NCBI_API_KEY? If not you should do that first. Since you are doing this via a script build in a delay between your queries to ensure you don't get flagged by NCBI server for sending too many queries in a short time.

ADD COMMENT
0
Entering edit mode

I have built in a time delay upon your suggestion so only one query is submitted per second. This improves the number of sequences downloaded to 140, but still results in the same error message.

ADD REPLY
1
Entering edit mode

Can you try increasing the delay? Since you are requesting gbwithparts that has to be executed to get you the right section. I would say try one query every 15 or 30 seconds (or longer).

ADD REPLY
0
Entering edit mode

I have tried increasing the delay to 60s, and got the same number downloaded as when I used a 1.5s delay. Perhaps I should just do it in batches of 140 rather than all at once.

ADD REPLY
1
Entering edit mode
3.5 years ago
Shred ★ 1.4k

Use an Entrez API key and a time.sleep for your script. Usually 10sec is enough between each request (read more here)

ADD COMMENT
0
Entering edit mode

I used an Entrez API key, and have incorporated delays into my script. Whether the delay is 1.5s or 60s between requests, I still get the same error after 150 plastid genomes.

ADD REPLY
1
Entering edit mode

Identify if a specific accession is causing that problem and remove it.

ADD REPLY
0
Entering edit mode

This was the problem, thanks for the help!

ADD REPLY

Login before adding your answer.

Traffic: 2249 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6