Question: Distribute proportionality calculations on many nodes (SLURM)
0
18 months ago by
pablo140
pablo140 wrote:

Hello,

I explain what I did (with R) :

1 - I have a matrix of abundances of OTUs (more than 817.000 columns) . I need to compute the proportionality between these OTUs. For the moment, I can split a matrix in submatrices in order to compute the proportionality between each of these submatrices , and then, get the final matrix.

``````data=matrix(runif(10000), ncol=1000)   #random matrix
data=clr(data)
ncol<-ncol(data)

rest<-ncol%%100 #100 columns by submatrix
blocks<-ncol%/%100 #10 submatrices

ngroup <- rep(1:blocks, each = 100)
#if (rest>0) ngroup<-c(ngroup,rep(blocks+1,rest))
split <- split(1:ncol, ngroup)

#I get all the combinations between my submatrices

combs <- expand.grid(1:length(split), 1:length(split))
combs <- t(apply(combs, 1, sort))
combs <- unique(combs)
combs <- combs[combs[,1] != combs[,2],]

res <- foreach(i = seq_len(nrow(combs))) %dopar% {
G1 <- split[[combs[i,1]]]
G2 <- split[[combs[i,2]]]
dat.i <- cbind(data[,G1], data[,G2])
rho <- cor_rho(dat.i)  #cor_rho is my  function to compute the proportionality
}

**And then, I get the final matrix :**

resMAT <- matrix(0, ncol(data), ncol(data))

for(i in 1:nrow(combs)){
batch1 <- split[[combs[i,1]]]
batch2 <- split[[combs[i,2]]]
patch.i <- c(batch1, batch2)
resMAT[patch.i, patch.i] <- res[[i]]
}
``````

2 - I work with Slurm on a cluster with several nodes. I know that with a node (256G and 32CPUs, in one day, I can compute the proportionality between 60.000 columns). So, I need to use 817.000/60.000 ~ 14 submatrices which gives me (14*13)/2 = 91 combinations (=91 nodes).

3 - I don't know how I could use SLURM to create a SLURM code in order to distribute each combination calculation on a node.

Bests, Vincent

modified 18 months ago by Jean-Karim Heriche23k • written 18 months ago by pablo140

Actually, he gave me some pieces of advice but I'm still struggling with the SLURM syntax..

Couldn't you just submit the job 14 times (with different input files being your 14 different submatrices)? This would require 14 different SLURM submission scripts, which would probably take less time to write than a Python or Perl script to automatically generate the 14 scripts.

I would like to stay with my R code, and incorporate it in a SLURM code, if it possible?

`resMAT[patch.i, patch.i] <- res[[i]]` - you're only going to update entries on the diagonal of resMAT

But it means my final matrix is false?

ARG sorry, my mistake

1
18 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

This is not really a bioinformatics question but a programming one so might be better addressed on StackOverflow.

I am probably missing something but you may not need to split your matrix into blocks, you could use something like this in R:

``````library(foreach)
library(doSNOW)

cluster <- makeCluster(number_of_nodes, type = "SOCK")
registerDoSNOW(cluster)
# This is a typical way to parallelize computation of a distance/similarity matrix
result_matrix <- foreach(j = c(1:n), .combine='cbind') %:%
foreach(i = c(1:n), .combine = 'c') %dopar% {
cor_rho(data[i], data[j])
}
stopCluster(cluster)
``````

The bash script submitted to SLURM would look like this:

``````#!/bin/bash

#SBATCH --mail-type=FAIL,END
#SBATCH -N number_of_cpus
#SBATCH -n number_of_nodes
#SBATCH --mem=memory_required

Rscript my_script
``````

Not to repeat calculations, maybe forloops should be:

``````foreach(j = 1:(n -1), .combine='cbind') %:%
foreach(i = (j + 1):n), .combine = 'c') %dopar% {...
``````

Thanks for you answer. I'll try your solution. When you put `1:n`, does n means ncol(data) ?

However, if I execute my R script in a SLURM code, does SLURM will "know" how to distribute each combination on different nodes? (if I precise :

``````#SBATCH --mail-type=FAIL,END
#SBATCH -N 32
#SBATCH -n 91
#SBATCH --mem=250G )
``````

Is the default value for `-c` (`--cpus-per-task`) set to 1 cpu (`-n`) on your cluster? If that is true then you are asking for `-n 91` cores/tasks from a node with 32 cores.

I generally use `-n` with `-N` to specify number of task/cores per node. If you only have 32 cores per node then you may need to specify `-n 32` along with `-N 91` (nodes), if you want them all to run at the same time? Not sure if you can divide your R-jobs to submit SLURM sub-jobs. Using job arrays may be an option then.

Yes, it sets up to 1 cpu by default.

Actually, I would like to distribute each combination to one node, like this :

``````submatrix 1 vs submatrix 2 -> node 1
submatrix 2 vs submatrix 3 -> node 2
...
submatrix 13 vs submatrix 14 -> node 91
``````

I don't know if it possible?

It should be possible as long as you submit individual jobs. See changes I made to my comment above.

Actually, I don't mind to run them at the same time. It could be necessary to wait that some jobs end before some others begin?

It could be necessary to wait that some jobs end before some others begin?

If that is the case, then you would need to look into `--dependency=type:jobid` option for `sbatch` to make sure those tasks don't start until the job (`jobid` above) they are dependent on finishes successfully.

Thanks. But is it normal when I execute for example (if I have only 10 combinations -> 10 nodes) `sbatch --partition normal --mem-per-cpu=4G --cpus-per-task=16 --nodes=10 my_R_script.sh` , it only creates one job and not 10 as I expected?

Yes it is normal because as far as SLURM is concerned only job it has been asked to run is `my_R_script.sh`. Unless this script creates sub-jobs from within that script with multiple `sbatch --blah do_this` commands SLURM can't run them as separate jobs.

Note: Some clusters may be configured not to allow an existing job to spawn sub-jobs. In that case your only option would be one below.

Other way would be to start the 10 computations independently (if that is possible) by doing:

``````sbatch --blah my_R_script1.sh
sbatch --blah my_R_script2.sh
..
sbatch --blah my_R_script10.sh
``````

Actually, I just use `my_R_script.sh` to execute my R script, that I put in the top the topic . So, it doesn't create any sub-jobs..

If I would like to create the 91 sub-jobs I need, do I let my R script as it is and I create a slurm script to create these sub-jobs?