Question: ID query with entrez -- invalid ID - rllib.error.HTTPError: HTTP Error 400: Bad Request
0
gravatar for marcos.a.godoy.f
5 months ago by
marcos.a.godoy.f10 wrote:

Hi,

I'm trying to get a fasta from a list of IDs, but I have a lot of invalid IDs in the list

When I find the invalid IDs in the list, I get an error and my query is interrupted: "urllib.error.HTTPError: HTTP Error 400: Bad Request"

How to ignore the error and continue the query?

This example stops the query on the second ID:

from urllib.request import urlopen                                          
from urllib.error import HTTPError 
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
for i in IDs:
    try:
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    except HTTPError:
        time.sleep(20)
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    records = Entrez.read(handle)
    #print(records)
    print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
    time.sleep(1) # to make sure not many requests go per second to ncbi
sequence gene • 335 views
ADD COMMENTlink modified 5 months ago by a.zielezinski9.3k • written 5 months ago by marcos.a.godoy.f10
3
gravatar for a.zielezinski
5 months ago by
a.zielezinski9.3k
a.zielezinski9.3k wrote:

You can modify your script to try downloading the sequence record three times until all fail. If all three attempts fail, skip this record.

from urllib.request import urlopen
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
max_attemps = 3

for i in IDs:
    handle = None
    for n in range(max_attemps):
        try:
            handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
            break
        except:
            time.sleep(1)
    if handle:
        records = Entrez.read(handle)
        print("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
        time.sleep(1) # to make sure not many requests go per second to ncbi
    else:
        print('Could not download: {}'.format(i))

Output:

> AY851612 Opuntia subulata rpl16 gene, intron; chloroplast Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae; Austrocylindropuntia
cattaaagaagggggatgcggataaatggaaaggcgaaagaaagaaaaaaatgaatctaaatgatatacgattccactatgtaaggtctttgaatcatatcataaaagacaatgtaataaagcatgaatacagattcacacataattatctgatatgaatctattcatagaaaaaagaaaaaagtaagagcctccggccaataaagactaagagggttggctcaagaacaaagttcattaagagctccattgtagaattcagacctaatcattaatcaagaagcgatgggaacgatgtaatccatgaatacagaagattcaattgaaaaagatcctaatgatcattgggaaggatggcggaacgaaccagagaccaattcatctattctgaaaagtgataaactaatcctataaaactaaaatagatattgaaagagtaaatattcgcccgcgaaaattccttttttattaaattgctcatattttattttagcaatgcaatctaataaaatatatctatacaaaaaaatatagacaaactatatatatataatatatttcaaatttccttatatacccaaatataaaaatatctaataaattagatgaatatcaaagaatctattgatttagtgtattattaaatgtatatcttaattcaatattattattctattcatttttattcattttcaaatttataatatattaatctatatattaatttataattctattctaattcgaattcaatttttaaatattcatattcaattaaaattgaaattttttcattcgcgaggagccggatgagaagaaactctcatgtccggttctgtagtagagatggaattaagaaaaaaccatcaactataaccccaagagaaccaga
Could not download: hahdshjhdasdhas
> AY851612 Opuntia subulata rpl16 gene, intron; chloroplast Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae; Austrocylindropuntia
cattaaagaagggggatgcggataaatggaaaggcgaaagaaagaaaaaaatgaatctaaatgatatacgattccactatgtaaggtctttgaatcatatcataaaagacaatgtaataaagcatgaatacagattcacacataattatctgatatgaatctattcatagaaaaaagaaaaaagtaagagcctccggccaataaagactaagagggttggctcaagaacaaagttcattaagagctccattgtagaattcagacctaatcattaatcaagaagcgatgggaacgatgtaatccatgaatacagaagattcaattgaaaaagatcctaatgatcattgggaaggatggcggaacgaaccagagaccaattcatctattctgaaaagtgataaactaatcctataaaactaaaatagatattgaaagagtaaatattcgcccgcgaaaattccttttttattaaattgctcatattttattttagcaatgcaatctaataaaatatatctatacaaaaaaatatagacaaactatatatatataatatatttcaaatttccttatatacccaaatataaaaatatctaataaattagatgaatatcaaagaatctattgatttagtgtattattaaatgtatatcttaattcaatattattattctattcatttttattcattttcaaatttataatatattaatctatatattaatttataattctattctaattcgaattcaatttttaaatattcatattcaattaaaattgaaattttttcattcgcgaggagccggatgagaagaaactctcatgtccggttctgtagtagagatggaattaagaaaaaaccatcaactataaccccaagagaaccaga
ADD COMMENTlink modified 5 months ago • written 5 months ago by a.zielezinski9.3k
1

I also solved it in another way:

from urllib.request import urlopen                                          
from urllib.error import HTTPError 
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hjshdaskdhsakjdhaskj', 'AY851612']
for i in IDs:
    try:
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    except HTTPError:
        try:
            time.sleep(30)
            handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
        except HTTPError:
            print('Could not download: {}'.format(i))
        continue
    records = Entrez.read(handle)

    print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
    time.sleep(1) # to make sure not many requests go per second to ncbi
ADD REPLYlink modified 5 months ago • written 5 months ago by marcos.a.godoy.f10

Thank you for sharing!

ADD REPLYlink written 5 months ago by a.zielezinski9.3k

Thank you for sharing! Really helpful.

ADD REPLYlink written 10 weeks ago by npeng10

It works perfectly Thank you

ADD REPLYlink written 5 months ago by marcos.a.godoy.f10
2

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink written 5 months ago by genomax92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1699 users visited in the last hour