Question: Biopython - AttributeError: when downloading from genbank
0
gravatar for Wilber0x
4 weeks ago by
Wilber0x10
Wilber0x10 wrote:

I have a list of around 200 plastid genomes which I want to download from genbank using biopython, and then put into one .gb file.

Here is the code I am using to do this:

out_handle = open(filename, "w")
for i in range(len(genbankIDs)):
    net_handle = Entrez.efetch(
        db="nucleotide", id=genbankIDs[i], rettype="gbwithparts", retmode="text"
    )
    out_handle.write(net_handle.read())
out_handle.close()
net_handle.close()
print("Saved")

where genbankIDs is the list of accession numbers of the sequences I want to download from genbank.

However, this only works for the first 20 accessions. I get this error message:

Traceback (most recent call last):
  File "fetchFromGenbank.py", line 25, in <module>
    db="nucleotide", id=genbankIDs[i], rettype="gbwithparts", retmode="text"
  File "/opt/anaconda2/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 195, in efetch
    return _open(cgi, variables, post=post)
  File "/opt/anaconda2/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 564, in _open
    and exception.status // 100 == 4:
AttributeError: 'HTTPError' object has no attribute 'status'

How can I solve this? Is it an instance of NCBI timing out, or a problem based on me using python 2.7 and not a more recent version?

biopython software error • 153 views
ADD COMMENTlink modified 4 weeks ago by Shred230 • written 4 weeks ago by Wilber0x10
1

Wilber0x : Biostars built-in SPAM protection (we need to have this in place sorry) does not allow HTTP links in title (I think it was interpreting HTTPerror in your post title as a link). I have edited that out so hopefully you post will not be automatically flagged/deleted as SPAM now. I also reinstated your account so you should be able to respond.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax92k

Thank you for your help

ADD REPLYlink written 4 weeks ago by Wilber0x10
2
gravatar for genomax
4 weeks ago by
genomax92k
United States
genomax92k wrote:

Have you signed up for NCBI_API_KEY? If not you should do that first. Since you are doing this via a script build in a delay between your queries to ensure you don't get flagged by NCBI server for sending too many queries in a short time.

ADD COMMENTlink written 4 weeks ago by genomax92k

I have built in a time delay upon your suggestion so only one query is submitted per second. This improves the number of sequences downloaded to 140, but still results in the same error message.

ADD REPLYlink written 4 weeks ago by Wilber0x10
1

Can you try increasing the delay? Since you are requesting gbwithparts that has to be executed to get you the right section. I would say try one query every 15 or 30 seconds (or longer).

ADD REPLYlink written 4 weeks ago by genomax92k

I have tried increasing the delay to 60s, and got the same number downloaded as when I used a 1.5s delay. Perhaps I should just do it in batches of 140 rather than all at once.

ADD REPLYlink written 4 weeks ago by Wilber0x10
1
gravatar for Shred
4 weeks ago by
Shred230
Shred230 wrote:

Use an Entrez API key and a time.sleep for your script. Usually 10sec is enough between each request (read more here)

ADD COMMENTlink written 4 weeks ago by Shred230

I used an Entrez API key, and have incorporated delays into my script. Whether the delay is 1.5s or 60s between requests, I still get the same error after 150 plastid genomes.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Wilber0x10
1

Identify if a specific accession is causing that problem and remove it.

ADD REPLYlink written 4 weeks ago by genomax92k

This was the problem, thanks for the help!

ADD REPLYlink written 4 weeks ago by Wilber0x10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2102 users visited in the last hour