I would like to parallelize bwa mem on multiple cores across multiple CPUs, on our high-performance computing cluster. This was previously discussed seven years ago in this thread: Parallelizing Bwa On Multiple Cpus. With that in mind, I'm now considering that the best way to do this is either:
1) Use Parallel BWA (pBWA).
2) Split the large input fastq files into multiple smaller fastq files, map each using its own instance of bwa mem on its own core, then merge them all together.
However, pBWA has not ben updated since October 2012. Further, I'd prefer option 2, as our cluster restricts the size of input file sizes. That said, as I'm using paired-end data (with two input fastq files), I'm not sure how best to split those up and ensure that reads in all the resulting files are still paired.
Does anyone have any insight on this, and -- given the time that's elapsed -- might there now be a better approach?