Question: HTTP 400: Bad request error in Biopython Entrez.efetch
0
gravatar for Solowars
5 weeks ago by
Solowars50
Brazil/Porto Alegre/UFRGS
Solowars50 wrote:

Dear all,

I wrote a script to retrieve the corresponding nucleotide CDS sequences from a list of protein identifiers from NCBI, using Entrez.efetch in Python 3.7, Anaconda 3, and This script worked well a few weeks ago, but now for some reason it doesn't. Let me show you the code

ids=['XP_021798999.1', 'XP_003909393.1', 'XP_004781165.1']
Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
record = handle.read()
record=re.sub('\\n\\n', '\\n', record)

While this used to work, now it gives me the following error:

Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
Traceback (most recent call last):

  File "<ipython-input-14-a939b978098e>", line 2, in <module>
    handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 184, in efetch
    return _open(cgi, variables, post=post)

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 545, in _open
    raise exception

  File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 543, in _open
    handle = _urlopen(cgi)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)

  File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Bad Request

I tried with different combinations (e.g. using other db and id parameters, just to test if it's a general thing or not), and some of them worked, yet unfortunately none of them are useful for me. I updated biopython modules as well (to version 1.73) in case it was that, but same result.

I'd really appreciate your thoughts.

Best,

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Solowars50

When doing programmatic queries against NCBI please build in a sleep interval. Have you also signed for NCBI API Keys? If you are not using those then your queries are further limited to 3 queries per second.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax69k

Dear genomax. I already signed for an API key, and I run other scripts (in R, though) taking your point into consideration. However, in my example there are no for loops, and as far as I know it would count as a single request, right? If that is the case, there must be something else...

ADD REPLYlink written 5 weeks ago by Solowars50
1

Queries seem to be working:

$ efetch -db protein -id "XP_004781165.1" -format fasta_cds_na
>lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS]
ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG
CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA
GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC
CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC
CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG
CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT

$ esearch -db protein -query "XP_004781165.1" | efetch -format fasta_cds_na
>lcl|XM_004781108.2_cds_XP_004781165.1_1 [gene=CEND1] [db_xref=GeneID:101688671] [protein=cell cycle exit and neuronal differentiation protein 1] [protein_id=XP_004781165.1] [location=213..662] [gbkey=CDS]
ATGGAGTCCAGGGGAAAGGCGACCAGCAGCCCCAAGCCCGACACCAAGGCTCCACAGGCCACTGCTGAGG
CCAGAGCCCCACCAGCTGCAGATGGAAAGGCCCCTTCAGCTAAGCCTGGGAAGAAGGAGGCCCAAGCAGA
GAAGCAGGAGCCTCCCGCAGCCCCCACACCACCAGCGGCCAAGAAGACCCCGGCCAAAGCAGACCCTACC
CTTCTCAATAACCACAGTAACCTGAAGCCAGCCCCTGCGGCCCCCAGCAGCCCTGATGCCGCCACCGAGC
CCAAGGGCCCTGGGGATGGGGCTGAGGAGGGTGAAGCCCCCAGCGGGACCCCAGGGGGCCGAGGCCCTTG
CCCCTTTGAGAACTTGACCCCCCTGCTCGTGGCTGGGAGTGTGGCCGTGGCCGCTGTAGCCCTAATTCTT
GGTGTGGCCTTCCTGGCCCGGAAAAAATGA
ADD REPLYlink written 5 weeks ago by genomax69k
  • Can you print the address you send and try via your browser?
  • Is it an http address (not https)? (though unlikely as you wrote it worked weeks ago)
ADD REPLYlink written 5 weeks ago by Carambakaracho1.4k

Hi Carambakaracho, I tried to do it via browser, following several examples in the docs, and they worked. However, using my example ids didn't work. I suspect that there must be something related to using 'nuccore' in combination with XP/NP ids...It shouldn't be that, because it worked just right short ago , but I'm starting to think that perhaps they changed something at NCBI's side :/

ADD REPLYlink written 5 weeks ago by Solowars50
2
gravatar for Solowars
5 weeks ago by
Solowars50
Brazil/Porto Alegre/UFRGS
Solowars50 wrote:

Ok, I found a solution (KUDOS to genomax for inspiring it).

Apparently the problem was indeed in the db parameter 'nuccore'. I switched it to 'protein' and it worked just right!

Thank you all for your time and attention.

ADD COMMENTlink written 5 weeks ago by Solowars50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 801 users visited in the last hour