Question: PubMed connection error
0
gravatar for agata88
20 months ago by
agata88730
Poland
agata88730 wrote:

Hi all,

I needed to annotate variants with PubMed database. To do that I've written program in python with the use of Entrez library. It sends about 10000 question to database, one by one.

Unfortunately it's bringing me an error:

> Traceback (most recent call last):
  File "part3_PubMedSearch.py", line 51, in <module>
    pubmedData = getDataFromPubmed(row[20])
  File "part3_PubMedSearch.py", line 25, in getDataFromPubmed
    handle = Entrez.esearch("pubmed", term=search)
  File "/usr/lib/python2.7/dist-packages/Bio/Entrez/__init__.py", line 189, in esearch
    return _open(cgi, variables)
  File "/usr/lib/python2.7/dist-packages/Bio/Entrez/__init__.py", line 466, in _open
    handle = _urlopen(cgi)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 469, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 656, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1240, in https_open
    context=self._context)
  File "/usr/lib/python2.7/urllib2.py", line 1197, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:590)>

Can I do something with that? Or this is the database connection issue or maybe database problem?

pubmed • 1.3k views
ADD COMMENTlink written 20 months ago by agata88730
1

This could have any number of causes. One of which, discussed below/above, could be that the server is started to get overloaded. However other causes are possible such as a mismatch in SSL versions (in which case, try updating openSSL).

ADD REPLYlink modified 20 months ago • written 20 months ago by Jean-Karim Heriche16k

It is most likely a warning, my variants are annotated, but I am not sure what is the cause of it ...

ADD REPLYlink written 20 months ago by agata88730

If you could show your code we could have a look at that... I hope you don't make those requests in a very short timeframe?

See also Biopython Entrez Guidelines

ADD REPLYlink modified 20 months ago • written 20 months ago by WouterDeCoster32k

This is the part of a script:

    #!python

import sys
import os
from Bio import Entrez
from directories import *

inputDir = sys.argv[1]
outputDir = sys.argv[2]

def addDatafile(database, filename) :
    with open(filename, 'r') as handle:
        for line in handle:
            line = line.rstrip()
            row = line.split("\t")
            if (row[1] != 'CHROM') :
                database.append(row)
    return database

def getDataFromPubmed(rs) :
    Entrez.email = "anonymous@gmail.com"
    search = rs + " AND Polish"
    handle = Entrez.esearch("pubmed", term=search)
    record = Entrez.read(handle)
    handle.close()
    return ",".join(record['IdList'])
ADD REPLYlink modified 20 months ago • written 20 months ago by agata88730

I am checking the history usage of ncbi according to the Entrez Guidelines:

For large queries, the NCBI also recommend using their session history feature (the WebEnv session cookie string, see Section 9.15). This is only slightly more complicated.

handle = Entrez.esearch("pubmed", term=search, usehistory="y")

Not sure if it is going to help...

ADD REPLYlink written 20 months ago by agata88730

That's not the complete script (as you said), so, are you making all those requests in a short timeframe? Perhaps adding a time.sleep(1) might be sensible, perhaps more although this obviously slows down your runtime.

ADD REPLYlink modified 20 months ago • written 20 months ago by WouterDeCoster32k

Right now I have another error:

Traceback (most recent call last):
  File "part3_PubMedSearch.py", line 51, in <module>
    pubmedData = getDataFromPubmed(row[20])
  File "part3_PubMedSearch.py", line 26, in getDataFromPubmed
    record = Entrez.read(handle)
  File "/usr/lib/python2.7/dist-packages/Bio/Entrez/__init__.py", line 376, in read
    record = handler.read(handle)
  File "/usr/lib/python2.7/dist-packages/Bio/Entrez/Parser.py", line 205, in read
    self.parser.ParseFile(handle)
  File "/usr/lib/python2.7/dist-packages/Bio/Entrez/Parser.py", line 343, in endElementHandler
    raise RuntimeError(value)
RuntimeError: Search Backend failed: read request has timed out. peer: 130.14.22.28:7011

Thanks! I am trying to go with time.sleep(1), this might be a problem.

ADD REPLYlink modified 20 months ago • written 20 months ago by agata88730
1

If you hammered the server with your first attempts, you may now have been blacklisted.

ADD REPLYlink written 20 months ago by Jean-Karim Heriche16k

Really? Not good....

ADD REPLYlink written 20 months ago by agata88730
1

From the guidelines:

In order not to overload the E-utility servers, NCBI recommends that users post no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays. Failure to comply with this policy may result in an IP address being blocked from accessing NCBI. If NCBI blocks an IP address, service will not be restored unless the developers of the software accessing the E-utilities register values of the tool and email parameters with NCBI.

Note that if you've been using your institution's network, the whole institution IP range may get blacklisted.

ADD REPLYlink written 20 months ago by Jean-Karim Heriche16k

No, I am using my home network, fortunately ;) but it looks like I am blocked, although I thought that services with closed databases (with licence) only block hacker connections :P... not publicly available database... but It makes sense it is easy to overload the database. Thanks!

ADD REPLYlink written 20 months ago by agata88730

It's quite clear in the guidelines that you shouldn't bombard the server with thousands of requests...

ADD REPLYlink written 20 months ago by WouterDeCoster32k

Ok, so It looks like the only way is to write a script which will download new database everyday to my local computer and run everything locally ... Thanks anyway :)

ADD REPLYlink written 20 months ago by agata88730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 733 users visited in the last hour