Question: Loop to retrieve rsID out of chrID list
0
gravatar for aroso491
4 days ago by
aroso4910
United Kingdom
aroso4910 wrote:

Hello, I have a GWAS summary statistics file where some of the rows are not rsID but chrID. I want to use bioMart package in R in order to find all the rsIDs for my 232 chrIDs in this format:

chr1_11105545 1 11105545 T C 0.965395 0.72017 -7.40832e-03 0.00460502 0.113547560

In bold the columns corresponding to chromosome (CHR) and position (POS). I know and have tested that it works if I use "values = list(chr_number, pos_number start, pos_number end)", but given that the size is 232, I am trying to write a loop to get all the list done.

I have done this but when I run it I get the error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300001 milliseconds with 15732557 bytes received

Which I think it is due to the fact that I'm somehow making an infinite loop or something of the likes of it? I've been at it for three days now though and I am a bit confused so I'd appreciate if someone helped me find the error or hinted me in the right direction!

#Extract ch_# rows in R dataframe (yields 232 hits)
SSNP_M_Ch <- subset(SSNP_M, grepl("chr\\d_", SNP))

snp = useMart("ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_snp")

results<-c()
for (i in 1:dim(SSNP_M_Ch)[1]){
  temp <- getBM(attributes = c('refsnp_id','allele','chrom_start','chrom_strand'),
                filters = c('chr_name','start','end'), 
                values = list(SSNP_M_Ch$CHR,SSNP_M_Ch$POS,SSNP_M_Ch$POS), mart = snp)  
  temp[] <- i
  results <- rbind(results,temp)
}

Thanks!

snp biomart R genome • 72 views
ADD COMMENTlink written 4 days ago by aroso4910
1

At least, you can easily check whether there is some progress adding 'print(head(temp))' after getBM query. Hovewer, in such a loop I would expect some subsetting like SSNP_M_Ch$CHR[i] aimed to feed getBM with a single current value. Presented code looks like whole CHR and POS columns of SSNP_M_Ch are passed to getBM.

Also, as far as I can understand, 'temp[] <- i' fills a dataframe after getBM query with a current index value - is it really something it meant to do?

How about this:

library(tidyverse)

results<- as.list(1:nrow(SSNP_M_Ch))

for (i in seq_along(results)){
  current_query = list(SSNP_M_Ch$CHR[i],
                       SSNP_M_Ch$POS[i],
                       SSNP_M_Ch$POS[i])
  print(current_query)

  results[[i]] <- getBM(attributes = c('refsnp_id','allele','chrom_start','chrom_strand'),
            filters = c('chr_name','start','end'), 
            values = current_query,
            mart = snp)
}
results = reduce(results, bind_rows)

By printing each current request, you'll be able to find if some id's cause troubles in getBM.

ADD REPLYlink modified 3 days ago • written 3 days ago by Alex Nesmelov50

BTW, I do not know whether the list that you passed to getBW as 'values' argument is correct - repeating POS values twice looks counterintuitive to me, but I did not work with biomartr and can't advice on this matter.

ADD REPLYlink written 3 days ago by Alex Nesmelov50

Thanks! I got it sorted in the end, I didn't realize I was passing the whole column instead of going row by row... I ended up doing SSNP_M_Ch$CHR[i,2], SSNP_M_Ch$POS[i,3] and just added the index value as another column in temp and that worked smoothly.

ADD REPLYlink written 1 day ago by aroso4910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 926 users visited in the last hour