Querying VEP using REST API
2
0
Entering edit mode
3.1 years ago
Arko ▴ 30

I'd like to query a JSON output for VEP using the REST API. I have the following code in python 3, can anyone tell me where I'm going wrong? I'm getting a HTTP 400 error.

import json
import requests
import sys
enter code here`with open('sample.tsv','r') as file:
for line in file:
if line.startswith("rs"):
snps = line.split()[0]
for i in snps :
server = "https://rest.ensembl.org"
ext = "/vep/human/id"
headers = { "Content-Type" : "application/json", "Accept" : "application/json"}
r = requests.post(server+ext, headers=headers, data='{ "ids" : [i] }')

if not r.ok:
r.raise_for_status()
sys.exit()

decoded = r.json()
print(repr(decoded))

This is pretty basic, I just want to get multiple outputs at the same time, instead of one by one. Any suggestions?

I wouldn't mind doing it using R either, if that's a good alternative. Thanks!

REST VEP Ensembl python R • 1.6k views
ADD COMMENT
0
Entering edit mode

can you please print i here? what are you querying?

ADD REPLY
0
Entering edit mode

rs2329763 rs6716521 rs4686605 rs9290916 rs7622109 rs16877127 rs160885 rs16908004 rs622120 rs970275 . . 1000 rows more

ADD REPLY
2
Entering edit mode
with open('test.txt','r') as f:
    test=f.readlines()

import requests, sys
server = "https://rest.ensembl.org"
for i in test:
    ext = "/variant_recoder/human/"+i
    r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})
    decoded = r.json()
    print(repr(decoded))

output:

[{'id': ['rs56116432'], 'input': 'rs56116432', 'hgvsp': ['ENSP00000483018.1:p.Gly229Asp', 'ENSP00000483265.1:p.Gly229Asp', 'ENSP00000487108.2:p.Gly230Asp', 'ENSP00000494079.1:p.Gly230Asp', 'ENSP00000494984.1:p.Gly230Asp', 'ENSP00000496236.1:p.Gly230Asp'], 'hgvsc': ['ENST00000453660.4:n.718G>A', 'ENST00000538324.2:c.686G>A', 'ENST00000611156.4:c.686G>A', 'ENST00000647353.1:n.54-4890G>A', 'ENST00000626615.2:c.689G>A', 'ENST00000644422.1:c.689G>A', 'ENST00000644755.1:c.689G>A', 'ENST00000645810.1:c.689G>A'], 'hgvsg': ['NC_000009.12:g.133256042C>T', 'CHR_HG2030_PATCH:g.133256189C>T']}]
[{'id': ['rs56116431'], 'input': 'rs56116431', 'hgvsc': ['ENST00000274498.9:c.1539-23605del', 'ENST00000378004.8:c.1539-23605del', 'ENST00000418236.5:c.254-23605del', 'ENST00000443674.5:c.395-23605del', 'ENST00000642734.1:c.1431-23605del', 'ENST00000645722.1:c.1539-23605del', 'LRG_1127t1:c.1539-23605del'], 'hgvsg': ['NC_000005.10:g.143097383del', 'LRG_1127:g.332014del']}]

input:

$ cat test.txt 
rs56116432
rs56116431
ADD REPLY
0
Entering edit mode

One question, do you have a good way of parsing or formatting the JSON output ? Right now I'm outputting everything into a text file.

ADD REPLY
0
Entering edit mode

Sure. What kind of output are you expecting?

ADD REPLY
0
Entering edit mode

All fields into a tsv or a csv format preferably. Something that's easy to read. Right now it's a mess when outputting into a text file directly.

ADD REPLY
1
Entering edit mode

I dump json file and then I use jq (standalone) offline for parsing output from Ensembl. I generally catch g,c, p syntax and calculated effect. But I do know there are enough json libraries in all languages, that can parse the way you want.

ADD REPLY
0
Entering edit mode

Please can you indent your script properly so that other people can run it.

ADD REPLY
4
Entering edit mode
3.1 years ago

You need to create a JSON dump of your list of IDs. Then you can use your whole list as input for the endpoint, you don't have to go through the list with a for loop.

Assuming you've created the array snps which contains a list of rsIDs:

data = json.dumps({ "ids" : snps })
server = "https://rest.ensembl.org"
ext = "/vep/human/id"
headers = { "Content-Type" : "application/json", "Accept" : "application/json"}
r = requests.post(server+ext, headers=headers, data=data)

if not r.ok:
    r.raise_for_status()
    sys.exit()

decoded = r.json()
print(repr(decoded))

This will query everything at once.

ADD COMMENT
0
Entering edit mode

Doesn't seem to work for me, receiving a HTTP 400 error. I've tried using a list as well as just a string with each rsid in a new line.

ADD REPLY
1
Entering edit mode

try this. Output is given below: code:

import json
import requests
import sys

with open('test.txt','r') as f:
    test=f.readlines()

data = json.dumps({ "ids" : test })
server = "https://rest.ensembl.org"
ext = "/vep/human/id"
headers = { "Content-Type" : "application/json", "Accept" : "application/json"}
r = requests.post(server+ext, headers=headers, data=data)

if not r.ok:
    r.raise_for_status()
    sys.exit()

decoded = r.json()
print(repr(decoded))

input:

$ cat test.txt 
rs56116432
rs56116431

output:

[{'colocated_variants': [{'allele_string': 'C/T', 'frequencies': {'T': {'afr': '0', 'gnomad_fin': '0.01363', 'gnomad': '0.003639', 'gnomad_afr': '0.0006606', 'eas': '0', 'gnomad_amr': '0.002336', 'ea': '0.003809', 'amr': '0.0014', 'gnomad_sas': '0.001334', 'gnomad_nfe': '0.003593', 'eur': '0.0109', 'sas': '0.001', 'gnomad_asj': '0.002471', 'gnomad_oth': '0.00628', 'gnomad_eas': '0', 'aa': '0.0007102'}}, 'start': '133256042', 'end': '133256042', 'strand': '1', 'minor_allele': 'T', 'seq_region_name': '9', 'minor_allele_freq': '0.0026', 'id': 'rs56116432'}], 'id': 'rs56116432', 'end': 133256042, 'seq_region_name': '9', 'start': 133256042, 'assembly_name': 'GRCh38', 'input': 'rs56116432', 'most_severe_consequence': 'missense_variant', 'allele_string': 'C/T', 'strand': 1, 'transcript_consequences': [{'gene_id': 'ENSG00000175164', 'gene_symbol_source': 'HGNC', 'gene_symbol': 'ABO', 'cdna_end': 718, 'hgnc_id': 'HGNC:79', 'transcript_id': 'ENST00000453660', 'cdna_start': 718, 'impact': 'MODIFIER', 'biotype': 'processed_transcript', 'consequence_terms': ['non_coding_transcript_exon_variant'], 'variant_allele': 'T', 'strand': -1}, {'strand': -1, 'consequence_terms': ['missense_variant'], 'cdna_end': 711, 'gene_symbol_source': 'HGNC', 'gene_id': 'ENSG00000175164', 'hgnc_id': 'HGNC:79', 'amino_acids': 'G/D', 'transcript_id': 'ENST00000538324', 'cdna_start': 711, 'protein_start': 229, 'variant_allele': 'T', 'codons': 'gGc/gAc', 'impact': 'MODERATE', 'biotype': 'protein_coding', 'sift_prediction': 'deleterious', '
        ============  removed text due to 5000 character limit===================
  'transcript_id': 'ENST00000642734', 'consequence_terms': ['intron_variant'], 'impact': 'MODIFIER', 'biotype': 'protein_coding', 'variant_allele': '-', 'strand': 1}, {'hgnc_id': 'HGNC:17073', 'gene_symbol': 'ARHGAP26', 'gene_id': 'ENSG00000145819', 'gene_symbol_source': 'HGNC', 'transcript_id': 'ENST00000645722', 'variant_allele': '-', 'strand': 1, 'impact': 'MODIFIER', 'biotype': 'protein_coding', 'consequence_terms': ['intron_variant']}], 'strand': 1, 'allele_string': 'A/-', 'id': 'rs56116431', 'colocated_variants': [{'id': 'rs56116431', 'seq_region_name': '5', 'strand': '1', 'end': '143097383', 'start': '143097383', 'allele_string': 'A/-'}], 'end': 143097383, 'start': 143097383, 'seq_region_name': '5'}]
ADD REPLY
0
Entering edit mode

How long is your list? The endpoint has a limit of 200 variants, so if it's longer you may need to chunk it.

ADD REPLY
0
Entering edit mode

That makes a lot of sense, it has 5000 variants. Chunk it into separate queries? Sounds time consuming to run 25+ chunks. Would you have a quicker or a more efficient way of going about this?

ADD REPLY
1
Entering edit mode

I would use the VEP script and run it all locally, but given that each query should take less than a second, 25+ of them is still going to be pretty quick.

ADD REPLY
0
Entering edit mode

What does this look like if I want to run the API in R? (i.e. json.dumps() doesn't work?).

ADD REPLY
1
Entering edit mode

You can make json in R using a library called jsonlite. Assuming that you have a list of values in a vector called snps, as in the above example, you could use:

data <- toJSON(list(ids=snps))

There is a full online course using Jupter notebooks available in R, Python and Perl with all the libraries and code examples you need.

ADD REPLY
0
Entering edit mode

Excellent, thanks for the quick response!

ADD REPLY
0
Entering edit mode
3.1 years ago

data='{ "ids" : [i] }')

there is no such ids parameter in the documentation : https://rest.ensembl.org/documentation/info/vep_id_get

ADD COMMENT
1
Entering edit mode

They're using the POST endpoint

ADD REPLY

Login before adding your answer.

Traffic: 2196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6