Best practices to submit a multiple node job in cluster
1
0
Entering edit mode
5.4 years ago
cmanu • 0

Hi guys

I'm quite new to bioinformatics right now I'm trying to run my first run at my university's cluster which uses slurm to manage the queue. After submitting a job for 4 nodes I noticed that the run was using only 1% CPU and that I was not getting any output files in my working directory. After some googling, I noticed that I did not define any scratchdir and I've adapted my submission looks something like this:

#!/bin/bash
#SBATCH -n 48
#SBATCH --mem=0
#SBATCH -o %j.o
#SBATCH -e %j.e
# Run for 7 days
#SBATCH -t 07-00:00:00
#SBATCH --exclusive
#SBATCH --job-name=2py

echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running on $SLURM_NPROCS processors."
echo "Current working directory is `pwd`"

mkdir -p /scratch/$USER/$JOB
SCRATCHDIR=/scratch/$USER/$JOB

module load CP2K/6.1-foss-2019a
mpirun srun cp2k.popt -i cp2k.inp -o cp2k.out

cp -r $SCRATCHDIR .
rm -rf $SCRATCHDIR

echo "Program finished with exit code $? at: `date`"

Do you guys have any advice on how to improve it? Also this will copy now all the files to my working directory right? If you have any resources that could help me learn about this that will be quite helpful

Thanks

slurm • 2.0k views
ADD COMMENT
2
Entering edit mode
5.4 years ago
GenoMax 154k

It is difficult to provide a useful answer for the question in the present form. But I will take a stab.

After submitting a job for 4 nodes I noticed that the run was using only 1% CPU and that I was not getting any output files in my working directory.

We don't know what program you are running and while you seem to be using mpirun it may or many not be appropriate for that program. Not all programs benefit from multi-threading (especially if they are not capable of using threading/parallel execution). They may also have steps that are serial where only one core may be doing the job that is necessary.

#SBATCH -n 48

Since you have also asked for exclusive access, do your cluster nodes have 48 cores on each?

After some googling, I noticed that I did not define any scratchdir

You should not have to strictly define this. If the program you are running expects there to be a scratch dir then that is one thing otherwise programs should automatically use /tmp for that purpose.

#SBATCH --mem=0

I hope that was an error since you appear to be assigning no memory to your job at all.

Since job scheduler implementations are site specific it would be best to first look if there is any local information available for this. Talk with your cluster admins/help desk to see what you can find.

ADD COMMENT
0
Entering edit mode

Thanks for the answer So I've removed the --mem=0 and the --exclusive tags. The program that I'm using is CP2K which runs (or can run) using MPI. I thought that it would be helpful to define a scratchdir as when running in several nodes I'm not able to see the output files being generated

ADD REPLY
1
Entering edit mode

Are you certain following method of parallel job submission is correct?

mpirun srun cp2k.popt -i cp2k.inp -o cp2k.out

OpenMPI site seems to indicate a slightly different way.

Try

mpirun cp2k.popt -i cp2k.inp -o cp2k.out

then submit the script file by doing

sbatch your_script_file
ADD REPLY
0
Entering edit mode

Some like to use srun in their sbatch scripts. I prefer not to. But genomax is right anyway, it should look like this.

srun mpirun ....
ADD REPLY

Login before adding your answer.

Traffic: 3440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6