I’m working with 150bp paired end human whole-genome sequencing (WGS) reads, with each sample sequenced across 8 lanes. I get two fastqs for each lane, which are ~4GB a piece
I’m currently aligning the FASTQ files in parallel using bwa mem, one process per lane. Even after increasing the -t (threads) parameter, the alignment step remains slow and typically taking around two days per sample.
I'm considering further parallelizing by splitting each lane-based FASTQ file using a tool like seqkit split2 -p 8, and then aligning the resulting chunks in parallel. This seems like it should provide a near-linear speedup, but I’m a bit cautious since splitting introduces an extra step where things could potentially go wrong.
Is splitting like this a common strategy for accelerating WGS alignments? Are there any caveats or best practices I should be aware of when using this approach?
Thanks
What is your hardware and command line? Two days sounds excessive, unless you are limited by RAM and cores, and that then would make it questionable what you would gain by putting more processes -- as you're probably bittlenecked already.
This run is running on a node with 60 cores but I only requested 4GB of RAM. Seem like it is only using 2 CPUs to near 100% if I open htop. The fastq are gzipped. Did I not ask for enough RAM? I can ask for much more.
EDIT: is my problem the sort command creating bottleneck? I see this stack exchange:
bwa mem -t 8 genome.fa reads.fastq | samtools sort -@8 -o output.bam -
if so, ultimatley I want a sorted cram file, would the command be this overall?:
4GB is in no way enough. The sort alone uses 768MB RAM per thread, so 8*768 already exceeds the memory. bwa with that many threads will use probably use a large two digit amount of RAM as well. Strange job did not get OOM-killed. I would run -t 32 or so, more is probably not doing much due to I/O bottleneck. Consider bwa-mem2 which is faster at identical results, see GitHub. Request 100GB RAM or so.
The issue was my snakemake pipeline and the threads not being passed correctly :(, your response helped me though!
Sorry the another question, but is bwa-mem2 the industry standard replacement for bwa-mem? If it produces identical results I would think so, but just wondering if there is any reason to not use it
I cannot tell about "industry-standard", but since it was developed in coop. with the bwa developer, and claims that it produces identical results I see no reason to not use it. Uses more RAM IIRC, but don't quote me on that.
You may switch from
bwa
tobwa-mem2
. There should be no difference in the output, but since bwa-mem2 is faster thanbwa
you will get a speed-up for free. Try to use stable, latest versions of the tools. And sincesamtools
is doing compression, increase the number of threads for it.