Trying to refine REST API query
1
0
Entering edit mode
19 months ago
aroso491 • 0

Hello,

I am trying to retrieve the rsIDs from a list of chromosomal positions. This is a long query (list contains ~60k variants) and thus I cannot submit it to bioMart, so using advice found in the forum I am using REST API and simply looping my list to my query.

Does anyone know if there is a more efficient way than what I am currently doing? It is taking very long and I am aiming to have this and another (much bigger) query done by tomorrow, but I worry about the time it will take me to process it, and since it is my first time using this API, maybe you can give me advice or point me towards the right direction.

Here is my piece of code:

server <- "https://grch37.rest.ensembl.org"

#Loop each chromosomal location to query and obtain list with rsID information
ID_T2D <- c()
for (i in coords){
# print(i)
ext<-paste("/overlap/region/homo_sapiens/",i,"?;feature=variation", sep ="")
# print(ext)
r <- GET(paste(server, ext, sep = ""), content_type("application/json"))
Sys.sleep(2)
# print(r)
#To know if there has been any errors in my query
stop_for_status(r)
temp <- fromJSON(toJSON(content(r)))
# print(temp)
ID_T2D <- rbind(ID_T2D,temp)
}


Where coords is a list of chromosomal locations in format X:XXXX:XXXX (chr number, start and end positions).

Any help would be GREATLY appreciated! Thanks

R ensembl API • 691 views
0
Entering edit mode

Are the chromosomal coordinates just single base and you're trying to find the SNP at that locus, or are they longer and you're trying to find all of them in the region?

0
Entering edit mode

They are single base and I want to retrieve the rsID of the SNP at that locus.

3
Entering edit mode
19 months ago
Emily 23k

Try using the VEP POST endpoint instead. This will allow you to input 200 regions in each query, so you can run much fewer queries. It does require you to input alleles, but just make them up, it won't affect the rsIDs you get out.

The example listed under section 6 in this Jupyter notebook is almost the script you need. You need to change it to the different VEP endpoint (region rather than HGVS).

0
Entering edit mode

Hi, I have tried to look at the documentation for the /vep/:species/region API and I to understand the format of the input I have run the example described in the page.

Since I have noticed that the query includes the chromosomal position, rsID and alleles, I have removed the rsID which is the bit of information that I am trying to retrieve for each of my SNPs. However, if I remove the rsID I get an empty list in return. If I substitute it by a dot or anything else, I still get information, but I do not get the rsID, which is the only thing that I am interested in.

May I ask help to understand what am I not seeing to get out rsIDs from this API? Because it feels like I should be able to get that information, but for some reason I am not managing to see the format... Thank you so much.

0
Entering edit mode

Can you send me your script and a couple of example inputs please?

0
Entering edit mode

Hi, what I tried to do was simply this:

server <- "https://rest.ensembl.org"
ext <- "/vep/homo_sapiens/region"
r <- POST(paste(server, ext, sep = ""), content_type("application/json"), accept("application/json"), body = '{ "variants" : ["21  26960070  G A . . .", "21  26965148  G A . . ." ] }')

stop_for_status(r)



This would return "list()" when I try to examine its content with the last line. After that, I tested the input as:

body = '{ "variants" : ["21  26960070 . G A . . .", "21  26965148 . G A . . ." ] }'


And then I got some information but not rsIDs...

0
Entering edit mode
server <- "https://grch37.rest.ensembl.org"


The code you've got is accessing GRCh38, which possibly does not have variants at the locus you've specified. Change the server as above to access GRCh37