How to optimized the alignment with bwa mem
0
0
Entering edit mode
3.1 years ago
quentin54520 ▴ 120

Hello all,

I need to do alignment of 74 human genome (it's 30x genome). For each i have 8 files because they were sequenced on 4 differents lane and it's paired end data.

To do these 296 alignments i thinks that job array is the best option to parallelize. But i'm not sure how to do that. On the sbatch option i shloud add (to do 10 simultaneous jobs)

#SBATCH --array=1-296%10

But then ? If i add a sample sheet with 3 column with sample name, read1 file, read2 files i coul use this command

r1=`sed -n "$SLURM_ARRAY_TASK_ID"p $samplesheet |  awk '{print $2}'` 
r2=`sed -n "$SLURM_ARRAY_TASK_ID"p $samplesheet |  awk '{print $3}'`

bwa mem -t 10 ref.fa $r1 $r2 | samtools view -bh  -@ 4 | samtools sort -@ 4 > .bam

For clarity i don't add all option used like read group, or mapping quality filter...

In this exemple i used 18 cpu for 1 alignment. In the sbatch command i need to request 18 and it will put 18 for each jobs (so at the maximum 180) ? It's the same for the memory, In the sbatch i need to request the amount of memory for one jobs or for the 10 simultaneous jobs ?

Thanks in advance and sorry if i'm not clear enough, I am still learning, before starting my thesis I had never done bioinformatics.

genome bwa alignment • 1.1k views
ADD COMMENT
1
Entering edit mode

I would cat the lane replicates as this is just a sequencing replicate, no need to keep them separated. The samtools view is not necessary as sort will read SAM directly from bwa. I would use bwa mem (...) | samtools sort -o out.bam, that will take care of everything. Increase CPU and/or memory via -m for the sorting to gain speed if you have the resources.

ADD REPLY
0
Entering edit mode

Thanks. And so when i use an sbatch array, in the sbatch option i could put #sbatch --cpus-per-task=18 to obtained 18 cpu for each jobs (so as i request 10 maximum simultaneous jobs, whit this option i request a maximum of 180 cpus ?) And in this case i could add something like --mem-per-cpu=1000 to have 18Go / jobs ?

ADD REPLY
0
Entering edit mode

. But i'm not sure how to do that. On the sbatch option i shloud add (to do 10 simultaneous jobs)

use a workflow manager like nextflow or snakemake

ADD REPLY
0
Entering edit mode

Yes it's what i want to do. But for the moment it's too complicated for my knowledge... But of course i planned to learn how to used snakemake.

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6