How to call a R variable in a loop with Slurm?
2.4 years ago
pablo ▴ 200

Hello,

I have a R (RHO_COR.R) script and I would like to create a loop in order to split the jobs on several nodes.

I show the part of the script where I would like to create the loop.

res <- foreach(i = seq_len(nrow(combs)) %dopar% {
G1 <- split[[combs[i,1]]]
G2 <- split[[combs[i,2]]]
dat.i <- cbind(data[,G1], data[,G2])
rho.i <- cor_rho(dat.i)
}


The different results of res (which correspond to submatrices of correlation between OTUs) are stored in several files. combs is a vector which looks like this (but it can change, according to my input file) :

> combs
[,1] [,2]
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    1    5
[5,]    2    3
[6,]    2    4
[7,]    2    5
[8,]    3    4
[9,]    3    5
[10,]    4    5


I would like to send each row of combs seq_len(nrow(combs) on a node.

This is my slurm code :

#!/bin/bash
#SBATCH --job-name=paral_cor
#SBATCH --partition=normal
#SBATCH --time=1-00:00:00
#SBATCH --mem=126G

#Set up whatever package we need to run with

# SET UP DIRECTORIES

OUTPUT="$HOME"/$(date +"%Y%m%d")_parallel_nodes_test
mkdir -p "$OUTPUT" export FILENAME=~/RHO_COR.R #Run the program Rscript$FILENAME > "$OUTPUT"  I do not want to use arrays. I wonder if I create an argument which is seq_len(nrow(combs) could be a solution ? for i in my_argument do Rscript$FILENAME -i > "$OUTPUT" done  Thanks (I asked on stackoverflow but I didn't get any answer back yet..) r slurm matrix • 1.6k views ADD COMMENT 0 Entering edit mode You'll need an srun in there and $i rather than -i.

Thanks for your reply. But can I combine srun and Rscript into the same loop? And other point, I don't know how to "call" my R variable as an argument into this loop.

Edit : I saved my variable into a file that I read in bash.

And I use :

res <- foreach(i = opt$subset) %dopar% { G1 <- split[[combs[i,1]]] G2 <- split[[combs[i,2]]] dat.i <- cbind(data[,G1], data[,G2]) rho.i <- cor_rho(dat.i) }  Slurm part var=$(cat ~/my_file.tsv | wc -l)
subset=$(seq$var)


I still struggle to find a way to execute the jobs on several nodes. The loop is executed on only one node and I don't find an issue with srun...

If you're going to use %dopar% then run it in parallel directly in R and don't bother submitting multiple jobs. You'll have to figure out how to do that on your local cluster of course. Otherwise just use an array job or create a loop in your sbatch script calling srun for each value of i.

That's what I would like to do : calling srun for each value of i .

I tried :

for i in $subset do srun Rscript my_script.R --subset$i
done


But it is still executed on one node..