Dear all,
I need help in running simultaneous jobs parallel on SLURM. I'm very new in running array jobs and working on SLURM. I have around 100 tar.gz files. I would like to unit them and use the fastq's for alignment with hisat2 exporting bam, sorting the bam files and finally exporting the output as sorted.bam files.
tar.gz -> fastq (after extraction) -> bam -> sorted.bam
I made a script for this like below to run on SLURM cluster.
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G
#SBATCH --time=05:59:59
#SBATCH --tmp=500G
#SBATCH --array=1-100%20
mkdir /home/destination
cd /home/destination
for i in /home/eg/*.tar.gz
do
tar xvzf $i -C $TMPDIR
for sample in $TMPDIR/*1.fastq
do
dir2="/home/destination"
base=$(basename $sample "_1.fastq")
base2=$(basename $i ".tar.gz")
module load HISAT2/2.0.4-goolf-1.7.20; module load SAMtools/1.3.1-goolf-1.7.20; hisat2 -p 8 --dta --rna-strandness RF -x /home/grch38_snp_tran/genome_snp_tran -1 $TMPDIR/${base}_1.fastq -2 $TMPDIR/${base}_2.fastq | samtools view -Sb - > $TMPDIR/${base2}.bam; samtools sort -T $TMPDIR/${base2}.sorted -o ${dir2}/${base2}.sorted.bam $TMPDIR/${base2}.bam
done
done
With this the jobs started running, but even after finishing one job the same job is repeating again. Do I need to specify "$SLURM_ARRAY_TASK_ID"? How to do that for the above code?
And also how to get .out files for each array ID index?
Any help is appreciated.