Question: ID query with entrez -- invalid ID - rllib.error.HTTPError: HTTP Error 400: Bad Request
0
gravatar for marcos.a.godoy.f
7 weeks ago by
marcos.a.godoy.f10 wrote:

Hi,

I'm trying to get a fasta from a list of IDs, but I have a lot of invalid IDs in the list

When I find the invalid IDs in the list, I get an error and my query is interrupted: "urllib.error.HTTPError: HTTP Error 400: Bad Request"

How to ignore the error and continue the query?

This example stops the query on the second ID:

from urllib.request import urlopen                                          
from urllib.error import HTTPError 
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
for i in IDs:
    try:
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    except HTTPError:
        time.sleep(20)
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    records = Entrez.read(handle)
    #print(records)
    print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
    time.sleep(1) # to make sure not many requests go per second to ncbi
sequence gene • 141 views
ADD COMMENTlink modified 7 weeks ago by a.zielezinski9.2k • written 7 weeks ago by marcos.a.godoy.f10
2
gravatar for a.zielezinski
7 weeks ago by
a.zielezinski9.2k
a.zielezinski9.2k wrote:

You can modify your script to try downloading the sequence record three times until all fail. If all three attempts fail, skip this record.

from urllib.request import urlopen
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
max_attemps = 3

for i in IDs:
    handle = None
    for n in range(max_attemps):
        try:
            handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
            break
        except:
            time.sleep(1)
    if handle:
        records = Entrez.read(handle)
        print("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
        time.sleep(1) # to make sure not many requests go per second to ncbi
    else:
        print('Could not download: {}'.format(i))

Output:

> AY851612 Opuntia subulata rpl16 gene, intron; chloroplast Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae; Austrocylindropuntia
cattaaagaagggggatgcggataaatggaaaggcgaaagaaagaaaaaaatgaatctaaatgatatacgattccactatgtaaggtctttgaatcatatcataaaagacaatgtaataaagcatgaatacagattcacacataattatctgatatgaatctattcatagaaaaaagaaaaaagtaagagcctccggccaataaagactaagagggttggctcaagaacaaagttcattaagagctccattgtagaattcagacctaatcattaatcaagaagcgatgggaacgatgtaatccatgaatacagaagattcaattgaaaaagatcctaatgatcattgggaaggatggcggaacgaaccagagaccaattcatctattctgaaaagtgataaactaatcctataaaactaaaatagatattgaaagagtaaatattcgcccgcgaaaattccttttttattaaattgctcatattttattttagcaatgcaatctaataaaatatatctatacaaaaaaatatagacaaactatatatatataatatatttcaaatttccttatatacccaaatataaaaatatctaataaattagatgaatatcaaagaatctattgatttagtgtattattaaatgtatatcttaattcaatattattattctattcatttttattcattttcaaatttataatatattaatctatatattaatttataattctattctaattcgaattcaatttttaaatattcatattcaattaaaattgaaattttttcattcgcgaggagccggatgagaagaaactctcatgtccggttctgtagtagagatggaattaagaaaaaaccatcaactataaccccaagagaaccaga
Could not download: hahdshjhdasdhas
> AY851612 Opuntia subulata rpl16 gene, intron; chloroplast Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae; Austrocylindropuntia
cattaaagaagggggatgcggataaatggaaaggcgaaagaaagaaaaaaatgaatctaaatgatatacgattccactatgtaaggtctttgaatcatatcataaaagacaatgtaataaagcatgaatacagattcacacataattatctgatatgaatctattcatagaaaaaagaaaaaagtaagagcctccggccaataaagactaagagggttggctcaagaacaaagttcattaagagctccattgtagaattcagacctaatcattaatcaagaagcgatgggaacgatgtaatccatgaatacagaagattcaattgaaaaagatcctaatgatcattgggaaggatggcggaacgaaccagagaccaattcatctattctgaaaagtgataaactaatcctataaaactaaaatagatattgaaagagtaaatattcgcccgcgaaaattccttttttattaaattgctcatattttattttagcaatgcaatctaataaaatatatctatacaaaaaaatatagacaaactatatatatataatatatttcaaatttccttatatacccaaatataaaaatatctaataaattagatgaatatcaaagaatctattgatttagtgtattattaaatgtatatcttaattcaatattattattctattcatttttattcattttcaaatttataatatattaatctatatattaatttataattctattctaattcgaattcaatttttaaatattcatattcaattaaaattgaaattttttcattcgcgaggagccggatgagaagaaactctcatgtccggttctgtagtagagatggaattaagaaaaaaccatcaactataaccccaagagaaccaga
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by a.zielezinski9.2k
1

I also solved it in another way:

from urllib.request import urlopen                                          
from urllib.error import HTTPError 
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hjshdaskdhsakjdhaskj', 'AY851612']
for i in IDs:
    try:
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    except HTTPError:
        try:
            time.sleep(30)
            handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
        except HTTPError:
            print('Could not download: {}'.format(i))
        continue
    records = Entrez.read(handle)

    print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
    time.sleep(1) # to make sure not many requests go per second to ncbi
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by marcos.a.godoy.f10

Thank you for sharing!

ADD REPLYlink written 7 weeks ago by a.zielezinski9.2k

It works perfectly Thank you

ADD REPLYlink written 7 weeks ago by marcos.a.godoy.f10
2

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink written 7 weeks ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1033 users visited in the last hour