Question

For loop parallelisation

0

Entering edit mode

4.4 years ago

mel22 ▴ 100

Hello, please I need your help to make this loop works, its a permutation+metaanlysis on two sets of data, but i have an error message showing I need foreach object, The idea is to get the results of N permutations (n.iter in my code) of the meta-anlysis of two sets of data, the association model is perforemd seperately on 1000 SNPs so for every merta analysis I need the results of 1000 SNPs in the two studies, that's why I am using foreach twice. Thanks for your help :

foreach(n.iter) %dopar%   
foreach(i=22:7269) %dopar% {    

Y_1<-as.vector(sample(V1,100,replace = F))  
fit_1<-glm(Y_1~data[,i],data=data,family=binomial())    

Y_2<-as.vector(sample(V2,100,replace = F))  
fit_2<-glm(Y_2~data2[,i],data=data2,family=binomial())      

sfit_y1<-summary(fit_y1)  
sfit_y2<-summary(fit_y2)  

p_yt<-sfit_y1$coefficients[3,4]  
p_ct<-sfit_y2$coefficients[3,4]  

p<-as.data.frame(p1=p_yt,n1=100,p2=p_ct,n2=100)  
meta2<-metap(p,2,verbose="N")  

meta<-(-log10(meta2[,2]))  
return(meta)   
}

and this is the original for loop I am trying to rewrite for parallelized computing as it takes too long time :

for (s in 1:n.iter) {
  for(i in 22:nvar)
  {p<-data.frame(p1=numeric(n.var),n1=rep(100,n.var),p2=numeric(n.var),n2=rep(100,n.var))
  Y_y1<-as.vector(sample(data$PHENOTYPE,100,replace = F))
  fit_y1<-glm(Y_y1~data[,i],data=data,family=binomial())  
  Y_y2<-as.vector(sample(data2$PHENOTYPE,100,replace = F))
  fit_y2<-glm(Y_y2~data2[,i],data=data2,family=binomial()) 
  j<-i-21
  sfit_y1<-summary(fit_y1)
  sfit_y2<-summary(fit_y2)
  p$p1[3]<-sfit_y1$coefficients[3,4]
  p$p2[3]<-sfit_y2$coefficients[3,4]
  x<-metap(p[j,],2,verbose="N")
  }
  meta[s,j]<-x[,2]
}
write.csv2(meta,"meta.txt",row.names = F)

Thank you for your help !

R • 1.1k views

ADD COMMENT • link 4.4 years ago by mel22 ▴ 100

1

Entering edit mode

Why are you using foreach twice ?

ADD REPLY • link 4.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you Kevin, I edited my post to explain better the objective of my code

ADD REPLY • link 4.4 years ago by mel22 ▴ 100

score 4 · Answer 1 · 2019-12-03

foreach returns the output of each loop as a list object, which can then be 'bound' together like this:

res <- foreach(...) %dopar% {
.. }

do.call(rbind, res)

Depending on what you're doing inside the foreach loop, one can also use:

data.table::rbindlist(res)

You can also control how the output is bound together within the foreach() function call itself:

foreach(l = seq_len(blocks),
    .combine = rbind,
    .multicombine = TRUE,
    .inorder = FALSE,
    .packages = c('data.table', 'doParallel')) %dopar% {...}

It can be tricky to get familiarised with how it works, but very powerful once done.

I have implemented this 'nested' parallel processing (mclapply / parLapply inside foreach loops) in the Bioconductor package RegParallel, which has not proved hugely popular but which serves a very specific function: to rapidly run 1000s or even millions of independent regression models in chunks / batches.

Here are code chunks from it that may help you:

https://github.com/kevinblighe/RegParallel/blob/master/R/RegParallel.R#L15-L91 (the main function where I set up the system for parallel processing)
https://github.com/kevinblighe/RegParallel/blob/master/R/RegParallel.R#L195-L207 (main function, where it calls one of the other functions to run linear models)
https://github.com/kevinblighe/RegParallel/blob/master/R/lmParallel.R#L18-L24 (the linear model function where I call foreach)
https://github.com/kevinblighe/RegParallel/blob/master/R/lmParallel.R#L38-L41 (the linear model function where I call the apply function)

score 3 · Answer 2 · 2019-12-03

3

Entering edit mode

4.4 years ago

zx8754 11k

I think the first forloop should be a normal forloop:

for(iter in seq(n.iter)){ 
   foreach(i=22:7269) %dopar% {
   ...
   }
  }

ADD COMMENT • link 4.4 years ago by zx8754 11k

0

Entering edit mode

Thank you I will try this

ADD REPLY • link 4.4 years ago by mel22 ▴ 100

1

Entering edit mode

Indeed, this would be easier. You can also actually do mclapply() (or parLapply() on Windows) within a foreach statement. If you then choose, for example, 3 cores, this will result in 9 concurrent processes.