HTTP 400: Bad request error in Biopython Entrez.efetch
1
0
Entering edit mode
4.9 years ago
Solowars ▴ 70

Dear all,

I wrote a script to retrieve the corresponding nucleotide CDS sequences from a list of protein identifiers from NCBI, using Entrez.efetch in Python 3.7, Anaconda 3, and This script worked well a few weeks ago, but now for some reason it doesn't. Let me show you the code

ids=['XP_021798999.1', 'XP_003909393.1', 'XP_004781165.1']
Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
record = handle.read()
record=re.sub('\\n\\n', '\\n', record)

While this used to work, now it gives me the following error:

Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
Traceback (most recent call last):

  File "<ipython-input-14-a939b978098e>", line 2, in <module>
    handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 184, in efetch
    return _open(cgi, variables, post=post)

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 545, in _open
    raise exception

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 543, in _open
    handle = _urlopen(cgi)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Bad Request

I tried with different combinations (e.g. using other db and id parameters, just to test if it's a general thing or not), and some of them worked, yet unfortunately none of them are useful for me. I updated biopython modules as well (to version 1.73) in case it was that, but same result.

I'd really appreciate your thoughts.

Best,

python ncbi entrez biopython sequences • 5.3k views
ADD COMMENT
0
Entering edit mode

When doing programmatic queries against NCBI please build in a sleep interval. Have you also signed for NCBI API Keys? If you are not using those then your queries are further limited to 3 queries per second.

ADD REPLY
0
Entering edit mode

Dear genomax. I already signed for an API key, and I run other scripts (in R, though) taking your point into consideration. However, in my example there are no for loops, and as far as I know it would count as a single request, right? If that is the case, there must be something else...

ADD REPLY
1
Entering edit mode

Queries seem to be working:

$ efetch -db protein -id "XP_004781165.1" -format fasta_cds_na
>lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS]
ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG
CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA
GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC
CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC
CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG
CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT

$ esearch -db protein -query "XP_004781165.1" | efetch -format fasta_cds_na
>lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS]
ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG
CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA
GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC
CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC
CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG
CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT
GGTGTGGCCTTCCTGGCCCGGAAAAAATGA
ADD REPLY
0
Entering edit mode
  • Can you print the address you send and try via your browser?
  • Is it an http address (not https)? (though unlikely as you wrote it worked weeks ago)
ADD REPLY
0
Entering edit mode

Hi Carambakaracho, I tried to do it via browser, following several examples in the docs, and they worked. However, using my example ids didn't work. I suspect that there must be something related to using 'nuccore' in combination with XP/NP ids...It shouldn't be that, because it worked just right short ago , but I'm starting to think that perhaps they changed something at NCBI's side :/

ADD REPLY
3
Entering edit mode
4.9 years ago
Solowars ▴ 70

Ok, I found a solution (KUDOS to genomax for inspiring it).

Apparently the problem was indeed in the db parameter 'nuccore'. I switched it to 'protein' and it worked just right!

Thank you all for your time and attention.

ADD COMMENT

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6