Hello all,
I need to do alignment of 74 human genome (it's 30x genome). For each i have 8 files because they were sequenced on 4 differents lane and it's paired end data.
To do these 296 alignments i thinks that job array is the best option to parallelize. But i'm not sure how to do that. On the sbatch option i shloud add (to do 10 simultaneous jobs)
#SBATCH --array=1-296%10
But then ? If i add a sample sheet with 3 column with sample name, read1 file, read2 files i coul use this command
r1=`sed -n "$SLURM_ARRAY_TASK_ID"p $samplesheet | awk '{print $2}'`
r2=`sed -n "$SLURM_ARRAY_TASK_ID"p $samplesheet | awk '{print $3}'`
bwa mem -t 10 ref.fa $r1 $r2 | samtools view -bh -@ 4 | samtools sort -@ 4 > .bam
For clarity i don't add all option used like read group, or mapping quality filter...
In this exemple i used 18 cpu for 1 alignment. In the sbatch command i need to request 18 and it will put 18 for each jobs (so at the maximum 180) ? It's the same for the memory, In the sbatch i need to request the amount of memory for one jobs or for the 10 simultaneous jobs ?
Thanks in advance and sorry if i'm not clear enough, I am still learning, before starting my thesis I had never done bioinformatics.
I would
cat
the lane replicates as this is just a sequencing replicate, no need to keep them separated. Thesamtools view
is not necessary assort
will read SAM directly frombwa
. I would usebwa mem (...) | samtools sort -o out.bam
, that will take care of everything. Increase CPU and/or memory via-m
for the sorting to gain speed if you have the resources.Thanks. And so when i use an sbatch array, in the sbatch option i could put #sbatch --cpus-per-task=18 to obtained 18 cpu for each jobs (so as i request 10 maximum simultaneous jobs, whit this option i request a maximum of 180 cpus ?) And in this case i could add something like --mem-per-cpu=1000 to have 18Go / jobs ?
use a workflow manager like nextflow or snakemake
Yes it's what i want to do. But for the moment it's too complicated for my knowledge... But of course i planned to learn how to used snakemake.