Hi,
I am using a python script to access the new API of NCBI dbSNP. The python script I used variation-API-query.py) is adapted from here (https://github.com/ncbi/dbsnp/blob/master/tutorials/Variation%20Services/spdi_batch.py).
The script I used is as followed:
import requests
import json
import argparse
import re
import sys
from itertools import islice, chain
import time
parser = argparse.ArgumentParser(description='batch process SPDI requests')
parser.add_argument(
'-i', dest='input_file', required=True,
help='The name of the input file to parse (VCF, HGVS or rs list, etc.)')
parser.add_argument(
'-t', dest='input_format', required=True,
help='The input file format (VCF, HGVS, or RS')
api_rootURL = 'https://api.ncbi.nlm.nih.gov/variation/v0/'
def batchRS(infile):
for rs in infile:
rs = re.sub('rs', '', rs.rstrip())
if rs.isdigit():
url = api_rootURL + 'refsnp/' + rs
time.sleep(3)
req = requests.get(url)
print(req.text)
batchfunctions = {
'VCF': batchVCF,
'RS': batchRS,
'HGVS': batchHGVS,
'HGVS_RS': batchHGVS2RS}
args = parser.parse_args()
infile = open(args.input_file, "r")
batchfunctions[args.input_format](infile)
The script is executed in the terminal using following command:
python variation-API-query.py -i test-SNP-list.txt -t RS > test-SNP-list.json
Error message:
Traceback (most recent call last):
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/home/yichao/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
conn.connect()
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 284, in connect
conn = self._new_conn()
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f20910a96d8>:
Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/yichao/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.ncbi.nlm.nih.gov', port=443): Max retries exceeded
with url: /variation/v0/refsnp/10892279 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f20910a96d8>: Failed to establish a new connection: [Errno -2] Name or service not known',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "variation-API-query.py", line 174, in <module>
batchfunctions[args.input_format](infile)
File "variation-API-query.py", line 102, in batchRS
req = requests.get(url)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/home/yichao/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.ncbi.nlm.nih.gov', port=443): Max retries exceeded with url: /variation/v0/refsnp/10892279 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f20910a96d8>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Issue:
1) An error of MaxRetryError occurs frequently in the retrieving process. In the attached error output, the specific SNP 10892279 can be accessed through Chrome, but the error occurs when I tried to execute the script.
2) This error is seen with different SNP IDs (when I re-run the command). All these SNPs are accessible through the browser (Chrome). The 'try it out' testing block on https://api.ncbi.nlm.nih.gov/variation/v0/#/RefSNP/get_refsnp__rsid_ also works well.
3) Also, I have used this script without error on other SNP files (that contains the same amount of SNPs with the same format). Just several of the SNP files have this error.
4) Adding a sleep time (time.sleep(3)) to the request.get() still get the same error.
5) API root api of either 'https://api.ncbi.nlm.nih.gov/variation/v0/refsnp/XXXX' or 'api_rootURL = 'https://api.ncbi.nlm.nih.gov/variation/v0/beta/refsnp/XXXX'' will have the same error.
Can you help me pinpoint what might went wrong?
Thank you so much!
Yichao