HTTP 400: Bad request error in Biopython Entrez.efetch
22 months ago
Solowars ▴ 60

Dear all,

I wrote a script to retrieve the corresponding nucleotide CDS sequences from a list of protein identifiers from NCBI, using Entrez.efetch in Python 3.7, Anaconda 3, and This script worked well a few weeks ago, but now for some reason it doesn't. Let me show you the code

ids=['XP_021798999.1', 'XP_003909393.1', 'XP_004781165.1']
Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
record=re.sub('\\n\\n', '\\n', record)


While this used to work, now it gives me the following error:

Traceback (most recent call last):

File "<ipython-input-14-a939b978098e>", line 2, in <module>
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')

File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 184, in efetch
return _open(cgi, variables, post=post)

File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 545, in _open
raise exception

File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 543, in _open
handle = _urlopen(cgi)

File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)

File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)

File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)

File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)

File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)

File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)



I tried with different combinations (e.g. using other db and id parameters, just to test if it's a general thing or not), and some of them worked, yet unfortunately none of them are useful for me. I updated biopython modules as well (to version 1.73) in case it was that, but same result.

Best,

python ncbi entrez biopython sequences • 1.7k views
When doing programmatic queries against NCBI please build in a sleep interval. Have you also signed for NCBI API Keys? If you are not using those then your queries are further limited to 3 queries per second.

Dear genomax. I already signed for an API key, and I run other scripts (in R, though) taking your point into consideration. However, in my example there are no for loops, and as far as I know it would count as a single request, right? If that is the case, there must be something else...

Queries seem to be working:

$efetch -db protein -id "XP_004781165.1" -format fasta_cds_na >lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS] ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT$ esearch -db protein -query "XP_004781165.1" | efetch -format fasta_cds_na
>lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS]
ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG
CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA
GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC
CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC
CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG
CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT
GGTGTGGCCTTCCTGGCCCGGAAAAAATGA

• Can you print the address you send and try via your browser?
• Is it an http address (not https)? (though unlikely as you wrote it worked weeks ago)
Hi Carambakaracho, I tried to do it via browser, following several examples in the docs, and they worked. However, using my example ids didn't work. I suspect that there must be something related to using 'nuccore' in combination with XP/NP ids...It shouldn't be that, because it worked just right short ago , but I'm starting to think that perhaps they changed something at NCBI's side :/

22 months ago
Solowars ▴ 60

Ok, I found a solution (KUDOS to genomax for inspiring it).

Apparently the problem was indeed in the db parameter 'nuccore'. I switched it to 'protein' and it worked just right!

Thank you all for your time and attention.