Question: Distribute proportionality calculations on many nodes (SLURM)
0
gravatar for pablo
18 months ago by
pablo140
pablo140 wrote:

Hello,

I explain what I did (with R) :

1 - I have a matrix of abundances of OTUs (more than 817.000 columns) . I need to compute the proportionality between these OTUs. For the moment, I can split a matrix in submatrices in order to compute the proportionality between each of these submatrices , and then, get the final matrix.

data=matrix(runif(10000), ncol=1000)   #random matrix
data=clr(data)     
ncol<-ncol(data)

rest<-ncol%%100 #100 columns by submatrix
blocks<-ncol%/%100 #10 submatrices

ngroup <- rep(1:blocks, each = 100)
#if (rest>0) ngroup<-c(ngroup,rep(blocks+1,rest))
split <- split(1:ncol, ngroup)

#I get all the combinations between my submatrices

combs <- expand.grid(1:length(split), 1:length(split)) 
combs <- t(apply(combs, 1, sort))
combs <- unique(combs) 
combs <- combs[combs[,1] != combs[,2],] 

res <- foreach(i = seq_len(nrow(combs))) %dopar% { 
  G1 <- split[[combs[i,1]]]
  G2 <- split[[combs[i,2]]]
  dat.i <- cbind(data[,G1], data[,G2])
  rho <- cor_rho(dat.i)  #cor_rho is my  function to compute the proportionality
}

**And then, I get the final matrix :**

resMAT <- matrix(0, ncol(data), ncol(data))

for(i in 1:nrow(combs)){ 
  batch1 <- split[[combs[i,1]]]
  batch2 <- split[[combs[i,2]]]
  patch.i <- c(batch1, batch2) 
  resMAT[patch.i, patch.i] <- res[[i]] 
}

2 - I work with Slurm on a cluster with several nodes. I know that with a node (256G and 32CPUs, in one day, I can compute the proportionality between 60.000 columns). So, I need to use 817.000/60.000 ~ 14 submatrices which gives me (14*13)/2 = 91 combinations (=91 nodes).

3 - I don't know how I could use SLURM to create a SLURM code in order to distribute each combination calculation on a node.

Any advice ?

Bests, Vincent

ADD COMMENTlink modified 18 months ago by Jean-Karim Heriche23k • written 18 months ago by pablo140

Is there a reason you haven't asked your cluster admin for the appropriate syntax for your cluster?

ADD REPLYlink written 18 months ago by Devon Ryan97k

Actually, he gave me some pieces of advice but I'm still struggling with the SLURM syntax..

ADD REPLYlink modified 18 months ago • written 18 months ago by pablo140

Couldn't you just submit the job 14 times (with different input files being your 14 different submatrices)? This would require 14 different SLURM submission scripts, which would probably take less time to write than a Python or Perl script to automatically generate the 14 scripts.

ADD REPLYlink written 18 months ago by jean.elbers1.4k

I would like to stay with my R code, and incorporate it in a SLURM code, if it possible?

ADD REPLYlink written 18 months ago by pablo140

resMAT[patch.i, patch.i] <- res[[i]] - you're only going to update entries on the diagonal of resMAT

ADD REPLYlink written 18 months ago by russhh5.5k

But it means my final matrix is false?

ADD REPLYlink written 18 months ago by pablo140

ARG sorry, my mistake

ADD REPLYlink written 18 months ago by russhh5.5k
1
gravatar for Jean-Karim Heriche
18 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

This is not really a bioinformatics question but a programming one so might be better addressed on StackOverflow.

I am probably missing something but you may not need to split your matrix into blocks, you could use something like this in R:

library(foreach)
library(doSNOW)

   cluster <- makeCluster(number_of_nodes, type = "SOCK")
   registerDoSNOW(cluster)
  # This is a typical way to parallelize computation of a distance/similarity matrix
   result_matrix <- foreach(j = c(1:n), .combine='cbind') %:%
                          foreach(i = c(1:n), .combine = 'c') %dopar% {
                            cor_rho(data[i], data[j])
                          }
   stopCluster(cluster)

The bash script submitted to SLURM would look like this:

#!/bin/bash

#SBATCH --mail-type=FAIL,END
#SBATCH -N number_of_cpus
#SBATCH -n number_of_nodes
#SBATCH --mem=memory_required

Rscript my_script
ADD COMMENTlink modified 18 months ago • written 18 months ago by Jean-Karim Heriche23k

Not to repeat calculations, maybe forloops should be:

foreach(j = 1:(n -1), .combine='cbind') %:%
  foreach(i = (j + 1):n), .combine = 'c') %dopar% {...
ADD REPLYlink written 18 months ago by zx87549.7k

Thanks for you answer. I'll try your solution. When you put 1:n, does n means ncol(data) ?

However, if I execute my R script in a SLURM code, does SLURM will "know" how to distribute each combination on different nodes? (if I precise :

#SBATCH --mail-type=FAIL,END
#SBATCH -N 32
#SBATCH -n 91
#SBATCH --mem=250G )
ADD REPLYlink written 18 months ago by pablo140

Is the default value for -c (--cpus-per-task) set to 1 cpu (-n) on your cluster? If that is true then you are asking for -n 91 cores/tasks from a node with 32 cores.

I generally use -n with -N to specify number of task/cores per node. If you only have 32 cores per node then you may need to specify -n 32 along with -N 91 (nodes), if you want them all to run at the same time? Not sure if you can divide your R-jobs to submit SLURM sub-jobs. Using job arrays may be an option then.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax91k

Yes, it sets up to 1 cpu by default.

Actually, I would like to distribute each combination to one node, like this :

submatrix 1 vs submatrix 2 -> node 1
 submatrix 2 vs submatrix 3 -> node 2 
   ...
 submatrix 13 vs submatrix 14 -> node 91

I don't know if it possible?

ADD REPLYlink written 18 months ago by pablo140

It should be possible as long as you submit individual jobs. See changes I made to my comment above.

ADD REPLYlink written 18 months ago by genomax91k

Actually, I don't mind to run them at the same time. It could be necessary to wait that some jobs end before some others begin?

ADD REPLYlink written 18 months ago by pablo140

It could be necessary to wait that some jobs end before some others begin?

If that is the case, then you would need to look into --dependency=type:jobid option for sbatch to make sure those tasks don't start until the job (jobid above) they are dependent on finishes successfully.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax91k

Thanks. But is it normal when I execute for example (if I have only 10 combinations -> 10 nodes) sbatch --partition normal --mem-per-cpu=4G --cpus-per-task=16 --nodes=10 my_R_script.sh , it only creates one job and not 10 as I expected?

ADD REPLYlink written 18 months ago by pablo140

Yes it is normal because as far as SLURM is concerned only job it has been asked to run is my_R_script.sh. Unless this script creates sub-jobs from within that script with multiple sbatch --blah do_this commands SLURM can't run them as separate jobs.

Note: Some clusters may be configured not to allow an existing job to spawn sub-jobs. In that case your only option would be one below.

Other way would be to start the 10 computations independently (if that is possible) by doing:

sbatch --blah my_R_script1.sh
sbatch --blah my_R_script2.sh
..
sbatch --blah my_R_script10.sh
ADD REPLYlink modified 18 months ago • written 18 months ago by genomax91k

Actually, I just use my_R_script.sh to execute my R script, that I put in the top the topic . So, it doesn't create any sub-jobs..

If I would like to create the 91 sub-jobs I need, do I let my R script as it is and I create a slurm script to create these sub-jobs?

ADD REPLYlink written 18 months ago by pablo140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1787 users visited in the last hour