Question: How to call a R variable in a loop with Slurm?
0
gravatar for pablo
17 months ago by
pablo140
pablo140 wrote:

Hello,

I have a R (RHO_COR.R) script and I would like to create a loop in order to split the jobs on several nodes.

I show the part of the script where I would like to create the loop.

res <- foreach(i = seq_len(nrow(combs)) %dopar% {
 G1 <- split[[combs[i,1]]]
 G2 <- split[[combs[i,2]]]
 dat.i <- cbind(data[,G1], data[,G2])
 rho.i <- cor_rho(dat.i)
}

The different results of res (which correspond to submatrices of correlation between OTUs) are stored in several files. combs is a vector which looks like this (but it can change, according to my input file) :

> combs
      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    3
 [6,]    2    4
 [7,]    2    5
 [8,]    3    4
 [9,]    3    5
[10,]    4    5

I would like to send each row of combs seq_len(nrow(combs) on a node.

This is my slurm code :

#!/bin/bash
#SBATCH -o job-%A_task.out
#SBATCH --job-name=paral_cor
#SBATCH --partition=normal
#SBATCH --time=1-00:00:00
#SBATCH --mem=126G  
#SBATCH --cpus-per-task=32

#Set up whatever package we need to run with

module load gcc/8.1.0 openblas/0.3.3 R

# SET UP DIRECTORIES

OUTPUT="$HOME"/$(date +"%Y%m%d")_parallel_nodes_test
mkdir -p "$OUTPUT"

export FILENAME=~/RHO_COR.R

#Run the program

Rscript $FILENAME > "$OUTPUT"

I do not want to use arrays. I wonder if I create an argument which is seq_len(nrow(combs) could be a solution ?

for i in my_argument
 do Rscript $FILENAME -i > "$OUTPUT"
done

Thanks

(I asked on stackoverflow but I didn't get any answer back yet..)

matrix slurm R • 1.1k views
ADD COMMENTlink modified 17 months ago • written 17 months ago by pablo140

You'll need an srun in there and $i rather than -i.

ADD REPLYlink written 17 months ago by Devon Ryan97k

Thanks for your reply. But can I combine srun and Rscript into the same loop? And other point, I don't know how to "call" my R variable as an argument into this loop.

ADD REPLYlink modified 17 months ago • written 17 months ago by pablo140

Edit : I saved my variable into a file that I read in bash.

And I use :

res <- foreach(i = opt$subset) %dopar% {
 G1 <- split[[combs[i,1]]]
 G2 <- split[[combs[i,2]]]
 dat.i <- cbind(data[,G1], data[,G2])
 rho.i <- cor_rho(dat.i)
}

Slurm part

var=$(cat ~/my_file.tsv | wc -l)
subset=$(seq $var)

I still struggle to find a way to execute the jobs on several nodes. The loop is executed on only one node and I don't find an issue with srun...

ADD REPLYlink modified 17 months ago • written 17 months ago by pablo140

If you're going to use %dopar% then run it in parallel directly in R and don't bother submitting multiple jobs. You'll have to figure out how to do that on your local cluster of course. Otherwise just use an array job or create a loop in your sbatch script calling srun for each value of i.

ADD REPLYlink written 17 months ago by Devon Ryan97k

That's what I would like to do : calling srun for each value of i .

I tried :

for i in $subset
do
srun Rscript my_script.R --subset $i 
done

But it is still executed on one node..

ADD REPLYlink modified 17 months ago • written 17 months ago by pablo140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 894 users visited in the last hour