Question

HTTP 400: Bad request error in Biopython Entrez.efetch

0

Entering edit mode

4.9 years ago

Solowars ▴ 70

Dear all,

I wrote a script to retrieve the corresponding nucleotide CDS sequences from a list of protein identifiers from NCBI, using Entrez.efetch in Python 3.7, Anaconda 3, and This script worked well a few weeks ago, but now for some reason it doesn't. Let me show you the code

ids=['XP_021798999.1', 'XP_003909393.1', 'XP_004781165.1']
Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
record = handle.read()
record=re.sub('\\n\\n', '\\n', record)

While this used to work, now it gives me the following error:

Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
Traceback (most recent call last):

  File "<ipython-input-14-a939b978098e>", line 2, in <module>
    handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 184, in efetch
    return _open(cgi, variables, post=post)

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 545, in _open
    raise exception

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 543, in _open
    handle = _urlopen(cgi)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Bad Request

I tried with different combinations (e.g. using other db and id parameters, just to test if it's a general thing or not), and some of them worked, yet unfortunately none of them are useful for me. I updated biopython modules as well (to version 1.73) in case it was that, but same result.

I'd really appreciate your thoughts.

Best,

python ncbi entrez biopython sequences • 5.4k views

ADD COMMENT • link 4.9 years ago by Solowars ▴ 70

0

Entering edit mode

When doing programmatic queries against NCBI please build in a sleep interval. Have you also signed for NCBI API Keys? If you are not using those then your queries are further limited to 3 queries per second.

ADD REPLY • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

Dear genomax. I already signed for an API key, and I run other scripts (in R, though) taking your point into consideration. However, in my example there are no for loops, and as far as I know it would count as a single request, right? If that is the case, there must be something else...

ADD REPLY • link 4.9 years ago by Solowars ▴ 70

1

Entering edit mode

Queries seem to be working:

$ efetch -db protein -id "XP_004781165.1" -format fasta_cds_na
>lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS]
ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG
CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA
GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC
CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC
CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG
CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT

$ esearch -db protein -query "XP_004781165.1" | efetch -format fasta_cds_na
>lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS]
ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG
CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA
GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC
CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC
CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG
CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT
GGTGTGGCCTTCCTGGCCCGGAAAAAATGA

ADD REPLY • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

Can you print the address you send and try via your browser?
Is it an http address (not https)? (though unlikely as you wrote it worked weeks ago)

ADD REPLY • link 4.9 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

Hi Carambakaracho, I tried to do it via browser, following several examples in the docs, and they worked. However, using my example ids didn't work. I suspect that there must be something related to using 'nuccore' in combination with XP/NP ids...It shouldn't be that, because it worked just right short ago , but I'm starting to think that perhaps they changed something at NCBI's side :/

ADD REPLY • link 4.9 years ago by Solowars ▴ 70

score 3 · Accepted Answer · 2019-06-13

3

Entering edit mode

4.9 years ago

Solowars ▴ 70

Ok, I found a solution (KUDOS to genomax for inspiring it).

Apparently the problem was indeed in the db parameter 'nuccore'. I switched it to 'protein' and it worked just right!

Thank you all for your time and attention.

ADD COMMENT • link 4.9 years ago by Solowars ▴ 70