Question

bwa parallel on a SLURM cluster

5

Entering edit mode

4.7 years ago

little_more ▴ 70

I'm working on a SLURM cluster with NGS data. I trimmed raw reads and was thinking of the best way to align them to the reference genome. I have pairs of reads for a few samples. I wrote a script for parallel bwa:

#SBATCH --cpus-per-task=1
#SBATCH --ntasks=10
#SBATCH --nodes=1

# align with bwa & convert to bam
bwatosam() {
  id=$1
  index=$2
  output=$3/"$id".bam
  fq1=$4/"$id".R1.fq.gz
  fq2=$4/"$id".R2.fq.gz

  bwa mem -t 16 -R '@RG\tID:"$id"\tSM:"$id"\tPL:ILLUMINA\tLB:"$id"_exome' -v 3 -M $index $fq1 $fq2 |
    samtools view -bo $output
};
export -f bwatosam

# run bwatosam in parallel
ls trimmed/*.R1.fq.gz |
 xargs -n 1 basename |
  awk -F ".R1" '{print $1 | "sort -u"}' |
   parallel -j $SLURM_NTASKS "bwatosam {} index.fa alns trimmed"

But I'm not sure if I use the right parameters (#SBATCH) for the job because if I do it without -j:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=5

# run bwatosam in parallel
ls trimmed/*.R1.fq.gz |
  xargs -n 1 basename |
   awk -F ".R1" '{print $1 | "sort -u"}' |
    parallel "bwatosam {} index.fa alns trimmed"

It works 10 times faster. What number of nodes/cpus/threads should I use?

bwa • 2.4k views

ADD COMMENT • link 4.7 years ago by little_more ▴ 70

0

Entering edit mode

Have you tried submitting jobs directly to SLURM without the additional complexity of using parallel. On a cluster (with parallel) you are adding complexity for no good reason as far as I see.

ADD REPLY • link 4.7 years ago by GenoMax 141k

score 4 · Accepted Answer · 2019-08-06

4

Entering edit mode

4.7 years ago

ATpoint 81k

Depends on the node. I typically run alignments with basically this kind of script (sorry Pierre Lindenbaum, no snakemake yet) on a 72-core node with 192GB RAM, and then use:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=72
#SBATCH --partition=normal

In this case I would use 4 parallel processes with 16 threads for bwa each. Depends on how much memory your node has. Can you give some details? When using parallel I recommend booking the entire node to ensure you are not interfering with processes from other users.

=> Note that I always book the entire node if running parallel things so I essentially do not care about RAM consumption etc. as long as the node can handle it. If you share the node with others it might be a good idea to task the your admin before if using parallel is allowed on your cluster nodes.

ADD COMMENT • link 4.7 years ago by ATpoint 81k

0

Entering edit mode

ATpoint, thanks for answering to my question without endless referring to snakemake! :)

clusters specifications:

376 nodes: each with 2 processors (8 cores each) & 64 GB RAM

144 nodes: each with 2 processors (12 cores each) & 64 GB RAM

do you specify how much memory you need with #SBATCH --mem=...? and do you use -j 4 for using 4 parallel processes? because as I understood by default parallel runs maximum number of parallel processes (that depends on a number of CPUs on the node).

ADD REPLY • link 4.7 years ago by little_more ▴ 70

1

Entering edit mode

Yes we have to set #SBATCH --mem=... on our cluster as the batch system kills processes using more than the specified amount. I have it at 80GB by default. For your 64GB nodes I would probably run 2 or 3 jobs in parallel, probably 2 as bwa sometimes uses a lot of memory when aligning batches of reads that are somewhat difficult (repetitive) from what I understand. That way you probably avoid running out of memory.

ADD REPLY • link 4.7 years ago by ATpoint 81k

0

Entering edit mode

so you suggest using:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
parallel -j 2...

right? sorry for many questions I just started working with this cluster and I was surprised when parallel -j ran longer than just parallel

ADD REPLY • link 4.7 years ago by little_more ▴ 70

1

Entering edit mode

No worries. --ntasks-per-node=8 would need to be 34 as this is the total number of threads you are going to use, so 2x16 for bwa plus 2x1 for the samtools view. Without -j it launches jobs on all files in parallel and will grep every available resource on the node, maybe not a good idea when sharing nodeswith others.

I always book the entire node so my advices here might not be adequate when sharing nodes with others, keep that in mind.

ADD REPLY • link 4.7 years ago by ATpoint 81k

0

Entering edit mode

I see. so every task will run on a separate CPU and a node needs to have 34 CPUs?

ADD REPLY • link 4.7 years ago by little_more ▴ 70

1

Entering edit mode

This is how I understand things.

ADD REPLY • link 4.7 years ago by ATpoint 81k

0

Entering edit mode

I see. Thanks very much for your help!

ADD REPLY • link 4.7 years ago by little_more ▴ 70