Parallelizing Bwa On Multiple Cpus
3
2
Entering edit mode
12.3 years ago

Has anyone successfully parallelized BWA alignment on multiple CPUs? Do the reads contained in a fastq file depend upon one another, or can one break the fastq file up, align the pieces using multiple CPUs, and then reassemble the resulting sam files? My suspicion is the answer is no, but I don't know and have not found anything in the BWA documentation. Does anyone have any experience with (parallel BWA)? Thanks.

bwa parallel next-gen sequencing • 8.5k views
ADD COMMENT
7
Entering edit mode
12.3 years ago

Yes, you can split the reads into multiple fastq files, align, and then merge results. The reads are aligned independently of each other.

ADD COMMENT
0
Entering edit mode

Thanks, Sean. We've just indeed confirmed this. The split reads can be concatenated without a problem after the alignment.

ADD REPLY
0
Entering edit mode

Just to clarify for posterity sake, the FASTQ files can be split into chunks, aligned on separate machines independently, and the results merged; this is equivalent to aligning one big FASTQ file.

ADD REPLY
2
Entering edit mode
12.3 years ago

Check out the -t [n-cpu] option. It allows you to use multiple processors... Is that what you are going for?

ADD COMMENT
1
Entering edit mode

Thanks, Zev. I believe the -t option refers to multithreading, not parallelizing across multiple processors. We're just investigating how to make efficient use of multiple nodes on a cluster, and identifying which stages of NGS alignment/variant calling can be truly parallelized.

ADD REPLY
0
Entering edit mode

I think he wants to split among multiple physical processors or machines.

ADD REPLY
0
Entering edit mode

@Chris_Miller: What do you think the time trade off is for splitting the fastqs and then doing bwa?

ADD REPLY
0
Entering edit mode

Oh, okay. I have also done this. I used template toolkit to write automated PBS submission scripts for whole genomic data (across many fastq). On our cluster I hauled through whole genome data. I was using 20 nodes with 12 CPU per core. I ran into one hitch: I didn't have priority on the cluster so I had to write a script to check that the alignments finished. May you have 'publication in premiere Scientific journal'.

ADD REPLY
0
Entering edit mode
11.9 years ago

Try pBWA specifically made for that.

Cheers

ADD COMMENT

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6