Max retries exceed error in using API of NCBI dbSNP
0
0
Entering edit mode
4.8 years ago

Hi,

I am using a python script to access the new API of NCBI dbSNP. The python script I used variation-API-query.py) is adapted from here (https://github.com/ncbi/dbsnp/blob/master/tutorials/Variation%20Services/spdi_batch.py).

The script I used is as followed:

import requests
import json
import argparse
import re
import sys
from itertools import islice, chain
import time

parser = argparse.ArgumentParser(description='batch process SPDI requests')
parser.add_argument(
    '-i', dest='input_file', required=True,
    help='The name of the input file to parse (VCF, HGVS or rs list, etc.)')
parser.add_argument(
    '-t', dest='input_format', required=True,
    help='The input file format (VCF, HGVS, or RS')

api_rootURL = 'https://api.ncbi.nlm.nih.gov/variation/v0/'

def batchRS(infile):
    for rs in infile:
        rs = re.sub('rs', '', rs.rstrip())
        if rs.isdigit():
            url = api_rootURL + 'refsnp/' + rs
            time.sleep(3)
            req = requests.get(url)
            print(req.text)

batchfunctions = {
    'VCF': batchVCF,
    'RS': batchRS,
    'HGVS': batchHGVS,
    'HGVS_RS': batchHGVS2RS}
args = parser.parse_args()
infile = open(args.input_file, "r")
batchfunctions[args.input_format](infile)

The script is executed in the terminal using following command: python variation-API-query.py -i test-SNP-list.txt -t RS > test-SNP-list.json

Error message:

Traceback (most recent call last):
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/home/yichao/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
  socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 284, in connect
    conn = self._new_conn()
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f20910a96d8>:         
Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.ncbi.nlm.nih.gov', port=443): Max retries exceeded     
with url: /variation/v0/refsnp/10892279 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection     object at 0x7f20910a96d8>: Failed to establish a new connection: [Errno -2] Name or service not known',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "variation-API-query.py", line 174, in <module>
    batchfunctions[args.input_format](infile)
  File "variation-API-query.py", line 102, in batchRS
    req = requests.get(url)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.ncbi.nlm.nih.gov', port=443): Max retries exceeded with url: /variation/v0/refsnp/10892279 (Caused by     NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f20910a96d8>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Issue:

1) An error of MaxRetryError occurs frequently in the retrieving process. In the attached error output, the specific SNP 10892279 can be accessed through Chrome, but the error occurs when I tried to execute the script.

2) This error is seen with different SNP IDs (when I re-run the command). All these SNPs are accessible through the browser (Chrome). The 'try it out' testing block on https://api.ncbi.nlm.nih.gov/variation/v0/#/RefSNP/get_refsnp__rsid_ also works well.

3) Also, I have used this script without error on other SNP files (that contains the same amount of SNPs with the same format). Just several of the SNP files have this error.

4) Adding a sleep time (time.sleep(3)) to the request.get() still get the same error.

5) API root api of either 'https://api.ncbi.nlm.nih.gov/variation/v0/refsnp/XXXX' or 'api_rootURL = 'https://api.ncbi.nlm.nih.gov/variation/v0/beta/refsnp/XXXX'' will have the same error.

Can you help me pinpoint what might went wrong?

Thank you so much!

Yichao

SNP • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6