Question: Best approach for parallelizing bwa mem over multiple CPUs
gravatar for olikidrod
14 months ago by
olikidrod0 wrote:

I would like to parallelize bwa mem on multiple cores across multiple CPUs, on our high-performance computing cluster. This was previously discussed seven years ago in this thread: Parallelizing Bwa On Multiple Cpus. With that in mind, I'm now considering that the best way to do this is either:

1) Use Parallel BWA (pBWA).

2) Split the large input fastq files into multiple smaller fastq files, map each using its own instance of bwa mem on its own core, then merge them all together.

However, pBWA has not ben updated since October 2012. Further, I'd prefer option 2, as our cluster restricts the size of input file sizes. That said, as I'm using paired-end data (with two input fastq files), I'm not sure how best to split those up and ensure that reads in all the resulting files are still paired.

Does anyone have any insight on this, and -- given the time that's elapsed -- might there now be a better approach?


bwa parallel hpc • 856 views
ADD COMMENTlink modified 14 months ago by genomax80k • written 14 months ago by olikidrod0
gravatar for genomax
14 months ago by
United States
genomax80k wrote:
  1. Use latest plain bwa. bwa can use multiple cores. So make sure you are using the option -t INT Number of threads.
  2. You would want to keep individual job threads on a single physical server so as not to cause cross-talk. across node interconnects.
  3. You always map paired-end files together in a single job so that is not an issue. You could split the files up into smaller chunks as demonstrated here if you wanted to brute-force parallelize your jobs: A: Can BWA restart a calculation after a break? Make sure your files stay in sync across R1/R2 reads since aligners don't check for that.
  4. Don't uncompress your fastq files and use pipes to save on disk space ( C: How to combine two .sam files? )
ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax80k

Thanks for this. My understanding is that -t only refers to number of threads, whereas I need to parallelize over multiple physical servers and CPUs. Thus, I decided to go with brute-forcing as suggested in '3.'

For anyone else who stumbles across this, I chose to used bbmap to verify that paired FASTQ files (downloaded from published datasets) are properly paired. I then used Trimmomatic to remove low quality reads.

I then split the resulting paired files (in which both reads survived the quality check), also as described in 3., before mapping.

ADD REPLYlink written 14 months ago by olikidrod0

Just a suggestion. If you were using bbmap then you could have done everything in that suite. to split (not really needed but if you wish), to scan/trim and to align. Both programs are multi-threaded and can use all cores you can afford to throw at them.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax80k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1244 users visited in the last hour