Entering edit mode
7.2 years ago
agata88
▴
870
Hi all,
I needed to annotate variants with PubMed database. To do that I've written program in python with the use of Entrez library. It sends about 10000 question to database, one by one.
Unfortunately it's bringing me an error:
> Traceback (most recent call last):
File "part3_PubMedSearch.py", line 51, in <module>
pubmedData = getDataFromPubmed(row[20])
File "part3_PubMedSearch.py", line 25, in getDataFromPubmed
handle = Entrez.esearch("pubmed", term=search)
File "/usr/lib/python2.7/dist-packages/Bio/Entrez/__init__.py", line 189, in esearch
return _open(cgi, variables)
File "/usr/lib/python2.7/dist-packages/Bio/Entrez/__init__.py", line 466, in _open
handle = _urlopen(cgi)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 469, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 656, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1240, in https_open
context=self._context)
File "/usr/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:590)>
Can I do something with that? Or this is the database connection issue or maybe database problem?
This could have any number of causes. One of which, discussed below/above, could be that the server is started to get overloaded. However other causes are possible such as a mismatch in SSL versions (in which case, try updating openSSL).
It is most likely a warning, my variants are annotated, but I am not sure what is the cause of it ...
If you could show your code we could have a look at that... I hope you don't make those requests in a very short timeframe?
See also Biopython Entrez Guidelines
This is the part of a script:
I am checking the history usage of ncbi according to the Entrez Guidelines:
Not sure if it is going to help...
That's not the complete script (as you said), so, are you making all those requests in a short timeframe? Perhaps adding a
time.sleep(1)
might be sensible, perhaps more although this obviously slows down your runtime.Right now I have another error:
Thanks! I am trying to go with time.sleep(1), this might be a problem.
If you hammered the server with your first attempts, you may now have been blacklisted.
Really? Not good....
From the guidelines:
Note that if you've been using your institution's network, the whole institution IP range may get blacklisted.
No, I am using my home network, fortunately ;) but it looks like I am blocked, although I thought that services with closed databases (with licence) only block hacker connections :P... not publicly available database... but It makes sense it is easy to overload the database. Thanks!
It's quite clear in the guidelines that you shouldn't bombard the server with thousands of requests...
Ok, so It looks like the only way is to write a script which will download new database everyday to my local computer and run everything locally ... Thanks anyway :)