Entering edit mode
2.8 years ago
anasjamshed
▴
140
I have 422 gene lists and I want to fetch ONLY the following rows:
Gene
Variant ID
Location
vf_allele
Alleles
Clin. Sig.
Conseq. Type
cadd
revel_sort
meta_lr_sort
mutation_assessor_sort
publications
through Ensembl REST API
I am trying this code:
import requests, sys
server = "http://rest.ensembl.org"
ext = "/variant_recoder/homo_sapiens"
headers={ "Content-Type" : "application/json", "Accept" : "application/json"}
r = requests.post(server+ext, headers=headers, data='{ "ids" : ["rs56116432", "rs1042779" ] }')
if not r.ok:
r.raise_for_status()
sys.exit()
decoded = r.json()
print(repr(decoded))
This code asking me rs IDs but I want to input a list of 422 genes. Is this possible?
Could you please explain more.How can I input 400 genes in overlap id
The GET overlap endpoint only allows single gene ID per query, so you will need to create a loop within your script to submit each gene ID separately
Can you please help me to make loop?
This depends on the language you are using to query the REST API but you will need to create a list of your gene IDs then create a for loop, substituting the gene ID into the URL.
e.g in Python: https://www.w3schools.com/python/python_for_loops.asp
Get overlap will take only ensemble ids as input but i want to put gene symbols
No problem- then you'll need to combine this with the POST lookup/symbol endpoint to retrieve the Ensembl stable IDs associated with each gene symbol: http://rest.ensembl.org/documentation/info/symbol_post
A POST endpoint is available in this case, so you can submit all gene symbols in a single query.
Ben, I am trying this code now:
But it is giving me errors:
If you are just using the POST lookup endpoint, you don't need to include the loop. Something like this will print the full output for each gene symbol in your list:
Module 6 of the Ensembl REST API online course will teach you how to use the POST endpoints: https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/
Thanks. Now i am trying this code:
But it is giving me just information about mapped genes:
I want SNPs related to my genes using REST API
That's correct. You can use the POST Lookup/symbol endpoint to retrieve the Ensembl stable gene IDs. You will then need to use the list of stable IDs in the GET Overlap/id endpoint (using the loop) to retrieve the variants overlapping your genes of interest.
Thanks .
I have found ids of all genes through:
Now I want to use the list of these ids to fetch variants
see:
But it give me an error:
How can i solve it?
To retrieve variants overlapping your genes of interest, you will need to use the feature=variation optional parameter. So, your URL should look like the following: https://rest.ensembl.org/overlap/id/ENSG00000069188?feature=variation
In your script, the extension should look like this:
Thanks, Ben. I have finished my script like:
and it's giving me 8009 SNPs:
But when I move variations.to_csv("Variations.csv") to outside the loop then the SNPs reduce to 7569. What will be the reason?
Can you try within loop:
or
within loop, i guess you are overwriting the csv, every time loop is run. global csv might be the last csv. Remember in each loop (for each gene), different set of columns and in different order come out. Be careful while merging them.