Question

Closed:How to run simultaneous jobs in parallel using Job arrays in SLURM?

0

Entering edit mode

6.0 years ago

Biologist ▴ 290

Dear all,

I need help in running simultaneous jobs parallel on SLURM. I'm very new in running array jobs and working on SLURM. I have around 100 tar.gz files. I would like to unit them and use the fastq's for alignment with hisat2 exporting bam, sorting the bam files and finally exporting the output as sorted.bam files.

tar.gz -> fastq (after extraction) -> bam -> sorted.bam

I made a script for this like below to run on SLURM cluster.

#!/bin/bash

#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G
#SBATCH --time=05:59:59
#SBATCH --tmp=500G
#SBATCH --array=1-100%20

mkdir /home/destination
cd /home/destination

for i in /home/eg/*.tar.gz
do
tar xvzf $i -C $TMPDIR
for sample in $TMPDIR/*1.fastq
do
dir2="/home/destination"
base=$(basename $sample "_1.fastq")
base2=$(basename $i ".tar.gz")
module load HISAT2/2.0.4-goolf-1.7.20; module load SAMtools/1.3.1-goolf-1.7.20; hisat2 -p 8 --dta --rna-strandness RF -x /home/grch38_snp_tran/genome_snp_tran -1 $TMPDIR/${base}_1.fastq -2 $TMPDIR/${base}_2.fastq | samtools view -Sb - > $TMPDIR/${base2}.bam; samtools sort -T $TMPDIR/${base2}.sorted -o ${dir2}/${base2}.sorted.bam $TMPDIR/${base2}.bam
done
done

With this the jobs started running, but even after finishing one job the same job is repeating again. Do I need to specify "$SLURM_ARRAY_TASK_ID"? How to do that for the above code?

And also how to get .out files for each array ID index?

Any help is appreciated.

slurm parallel alignment fastq bam • 268 views

ADD COMMENT • link 6.0 years ago by Biologist ▴ 290