Question: how to parallelize single function in R?
0
gravatar for na.cna30
4.5 years ago by
na.cna300
na.cna300 wrote:

My question has two parts:

1) I want to run my function in parallel regardless of the code inside, is possible?e.g.

data("iris")
x.train <- iris[1:100,1:4]
y.train <- iris[1:100,5]
x.test <- iris[101:150,1:4]
y.test <- iris[101:150,5]

myfun<- function(x.train,y.train,x.test,y.test){
  library("e1071")
  model1 <- svm(x.train,y.train,type="c-classification")
  predc <<- predict(model,x.test)
  model2 <- svm(x.train,y.train,type="nu-classification")
  prednu <<- predict(model,x.test)   }

I want to parallelize this part:

myfun(x.train,y.train,x.test,y.test)  

2) I also want to run the above function multiple times:

for i=1:10
  myfun(x.train,y.train,x.test,y.test)  

Can you tell me how can I do these two parts in parallel in R?

PS: My original data is immense genome reads and I run over 10 classifiers, I really need do it in parallel.

R • 3.6k views
ADD COMMENTlink modified 4.5 years ago by andrew.j.skelton735.9k • written 4.5 years ago by na.cna300
3

This question is more adequate for StackOverflow since it is about R programming. Also, the code you have there is no way near enough to provide you with an answer. For instance the for loop syntax is not even R, and the function returns nothing - as far as I can tell. An internet search returns plenty of tutorials that should help to get started with parallelizing R functions. 1, 2, 3, 4. Good luck.

ADD REPLYlink written 4.5 years ago by A. Domingues2.1k

Thanks for your information. the function pass argument to the workspace by <<- sign.

ADD REPLYlink written 4.5 years ago by na.cna300
1

And you should post first few lines of your original data. No matter how large the data, you can always do a head on it and paste a few lines. What is the role of i other than running the same code 10 times over?

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by komal.rathi3.5k

Cool, I did not know about "<<-". Learned something today. 

ADD REPLYlink written 4.5 years ago by A. Domingues2.1k
4

Because it's bad practice and unsafe to use the 'global assignment' operator. Parallel (or even looped) calls to this function will overwrite each other's result. The function needs to be rewritten with a normal return value. 

ADD REPLYlink written 4.5 years ago by karl.stamm3.5k

Yes, I read the linked entry from Advanced R, and it looks like something one should not use unless *really* needed. Well, the first time I saw it was in an Advanced R book, and that tells it all :) 

ADD REPLYlink written 4.5 years ago by A. Domingues2.1k

haha, perhaps learned another way of R allowing one to shoot themselves in the foot.

ADD REPLYlink written 4.5 years ago by Istvan Albert ♦♦ 81k
4
gravatar for Dr. Mabuse
4.5 years ago by
Dr. Mabuse47k
Bergen, Norway
Dr. Mabuse47k wrote:

Do you know the R-package parallel https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf ?

A parallel version of apply and friends is a good example for the class of problems that can be easily parallelized ("embarrassingly parallel"). The problem can be broken down into fully independent steps, like aligning N fastq sequences or applying a function to rows of a matrix. All functions compatible with apply can be used like this. Other problems are more difficult, if they need to synchronize at one point (e.g. k-means).

ADD COMMENTlink written 4.5 years ago by Dr. Mabuse47k
1

I've had good experiences with the parLapply function of the parallel package. For a single desktop with 8 cores, it's easy to take apart a MC simulation, feed the dataset toward 8 worker nodes, and let them all have at it. I don't have to specify workload distribution, because parLapply gives each iteration to another node for me. 

ADD REPLYlink written 4.5 years ago by karl.stamm3.5k

So i need to break down my function into 2 parts, each including one SVM operation. right?

ADD REPLYlink written 4.5 years ago by na.cna300

No, I don't think so. The two svm steps need to be synchronized, because you can use the svm to predict only after it has finished training, or did you mean the two different svm's trained in one run? That you could do.

Also, you need to convert your input data into a nested list because the functions in package parallel work on lists only, there parallel apply functions in package snow, but I think this package needs a cluster of some sort.

ADD REPLYlink written 4.5 years ago by Dr. Mabuse47k
6
gravatar for andrew.j.skelton73
4.5 years ago by
London
andrew.j.skelton735.9k wrote:

I wrote this question in StackOverflow a while back.... I got some useful answers. 

http://stackoverflow.com/questions/17054026/parallel-and-multicore-processing-in-r

http://stackoverflow.com/questions/17196261/understanding-the-differences-between-mclapply-and-parlapply-in-r

ADD COMMENTlink written 4.5 years ago by andrew.j.skelton735.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour