For loop parallelisation
2
0
Entering edit mode
4.4 years ago
mel22 ▴ 100

Hello, please I need your help to make this loop works, its a permutation+metaanlysis on two sets of data, but i have an error message showing I need foreach object, The idea is to get the results of N permutations (n.iter in my code) of the meta-anlysis of two sets of data, the association model is perforemd seperately on 1000 SNPs so for every merta analysis I need the results of 1000 SNPs in the two studies, that's why I am using foreach twice. Thanks for your help :

foreach(n.iter) %dopar%   
foreach(i=22:7269) %dopar% {    

Y_1<-as.vector(sample(V1,100,replace = F))  
fit_1<-glm(Y_1~data[,i],data=data,family=binomial())    

Y_2<-as.vector(sample(V2,100,replace = F))  
fit_2<-glm(Y_2~data2[,i],data=data2,family=binomial())      

sfit_y1<-summary(fit_y1)  
sfit_y2<-summary(fit_y2)  

p_yt<-sfit_y1$coefficients[3,4]  
p_ct<-sfit_y2$coefficients[3,4]  

p<-as.data.frame(p1=p_yt,n1=100,p2=p_ct,n2=100)  
meta2<-metap(p,2,verbose="N")  

meta<-(-log10(meta2[,2]))  
return(meta)   
}

and this is the original for loop I am trying to rewrite for parallelized computing as it takes too long time :

for (s in 1:n.iter) {
  for(i in 22:nvar)
  {p<-data.frame(p1=numeric(n.var),n1=rep(100,n.var),p2=numeric(n.var),n2=rep(100,n.var))
  Y_y1<-as.vector(sample(data$PHENOTYPE,100,replace = F))
  fit_y1<-glm(Y_y1~data[,i],data=data,family=binomial())  
  Y_y2<-as.vector(sample(data2$PHENOTYPE,100,replace = F))
  fit_y2<-glm(Y_y2~data2[,i],data=data2,family=binomial()) 
  j<-i-21
  sfit_y1<-summary(fit_y1)
  sfit_y2<-summary(fit_y2)
  p$p1[3]<-sfit_y1$coefficients[3,4]
  p$p2[3]<-sfit_y2$coefficients[3,4]
  x<-metap(p[j,],2,verbose="N")
  }
  meta[s,j]<-x[,2]
}
write.csv2(meta,"meta.txt",row.names = F)

Thank you for your help !

R • 1.1k views
ADD COMMENT
1
Entering edit mode

Why are you using foreach twice ?

ADD REPLY
0
Entering edit mode

Thank you Kevin, I edited my post to explain better the objective of my code

ADD REPLY
4
Entering edit mode
4.4 years ago

foreach returns the output of each loop as a list object, which can then be 'bound' together like this:

res <- foreach(...) %dopar% {
.. }

do.call(rbind, res)

Depending on what you're doing inside the foreach loop, one can also use:

data.table::rbindlist(res)

You can also control how the output is bound together within the foreach() function call itself:

foreach(l = seq_len(blocks),
    .combine = rbind,
    .multicombine = TRUE,
    .inorder = FALSE,
    .packages = c('data.table', 'doParallel')) %dopar% {...}

It can be tricky to get familiarised with how it works, but very powerful once done.

I have implemented this 'nested' parallel processing (mclapply / parLapply inside foreach loops) in the Bioconductor package RegParallel, which has not proved hugely popular but which serves a very specific function: to rapidly run 1000s or even millions of independent regression models in chunks / batches.

Here are code chunks from it that may help you:

ADD COMMENT
1
Entering edit mode

Many thanks Kevin it's very usefull for me !

ADD REPLY
3
Entering edit mode
4.4 years ago
zx8754 11k

I think the first forloop should be a normal forloop:

for(iter in seq(n.iter)){ 
   foreach(i=22:7269) %dopar% {
   ...
   }
  }
ADD COMMENT
0
Entering edit mode

Thank you I will try this

ADD REPLY
1
Entering edit mode

Indeed, this would be easier. You can also actually do mclapply() (or parLapply() on Windows) within a foreach statement. If you then choose, for example, 3 cores, this will result in 9 concurrent processes.

ADD REPLY
0
Entering edit mode

Thank you Kevin, so I can have the output as data.frame or list ?

ADD REPLY

Login before adding your answer.

Traffic: 2011 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6