Running Snakemake on the computing cluster
16 months ago
wangdp123 ▴ 280

Hi there,

I have a question about how to run Snakemake tool properly on the HPC (cluster) and I understand that there are at least three ways of doing this below:

1) The following is the content of a test.sh shell script.

#!/bin/bash

#$-cwd #$ -V

#$-l h_rt=48:00:00 #$ -l nodes=10,ppn=1

snakemake -p --cores 10 --snakefile test.snakemake


The way of running is to run "qsub test.sh" to submit this job to the compute nodes of HPC.

2) Use the --cluster argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:

snakemake --cluster qsub -j 32 --snakefile test.snakemake


3) Use the --profile argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:

snakemake --profile myprofile --snakefile test.snakemake


I was wondering if the three methods can lead to the same effects. If not, what would be the difference? Although I was told that the first approach is not going to make use of the multiple nodes of HPC, I don't see that it is true as in practice it seems to have made use of the 10 nodes of HPC.

Many thanks,

Tom

I use your approach #1, because I need to load a custom conda environment for snakemake to run. From the documentation, I'm not clear if a job would be launched for each rule or for the entire pipeline when using options 2 and 3 - what is your experience on that?