Question: Parallelizing Bwa On Multiple Cpus
2
gravatar for Alex Paciorkowski
7.1 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

Has anyone successfully parallelized BWA alignment on multiple CPUs? Do the reads contained in a fastq file depend upon one another, or can one break the fastq file up, align the pieces using multiple CPUs, and then reassemble the resulting sam files? My suspicion is the answer is no, but I don't know and have not found anything in the BWA documentation. Does anyone have any experience with (parallel BWA)? Thanks.

parallel next-gen bwa sequencing • 5.2k views
ADD COMMENTlink written 7.1 years ago by Alex Paciorkowski3.3k
7
gravatar for Sean Davis
7.1 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Yes, you can split the reads into multiple fastq files, align, and then merge results. The reads are aligned independently of each other.

ADD COMMENTlink written 7.1 years ago by Sean Davis25k

Thanks, Sean. We've just indeed confirmed this. The split reads can be concatenated without a problem after the alignment.

ADD REPLYlink written 7.1 years ago by Alex Paciorkowski3.3k

Just to clarify for posterity sake, the FASTQ files can be split into chunks, aligned on separate machines independently, and the results merged; this is equivalent to aligning one big FASTQ file.

ADD REPLYlink written 7.1 years ago by Sean Davis25k
2
gravatar for Zev.Kronenberg
7.1 years ago by
United States
Zev.Kronenberg11k wrote:

Check out the -t [n-cpu] option. It allows you to use multiple processors... Is that what you are going for?

ADD COMMENTlink written 7.1 years ago by Zev.Kronenberg11k
1

Thanks, Zev. I believe the -t option refers to multithreading, not parallelizing across multiple processors. We're just investigating how to make efficient use of multiple nodes on a cluster, and identifying which stages of NGS alignment/variant calling can be truly parallelized.

ADD REPLYlink written 7.1 years ago by Alex Paciorkowski3.3k

I think he wants to split among multiple physical processors or machines.

ADD REPLYlink written 7.1 years ago by Chris Miller20k

@Chris_Miller: What do you think the time trade off is for splitting the fastqs and then doing bwa?

ADD REPLYlink written 7.1 years ago by Zev.Kronenberg11k

Oh, okay. I have also done this. I used template toolkit to write automated PBS submission scripts for whole genomic data (across many fastq). On our cluster I hauled through whole genome data. I was using 20 nodes with 12 CPU per core. I ran into one hitch: I didn't have priority on the cluster so I had to write a script to check that the alignments finished. May you have 'publication in premiere Scientific journal'.

ADD REPLYlink written 7.1 years ago by Zev.Kronenberg11k
0
gravatar for Sukhdeep Singh
6.7 years ago by
Sukhdeep Singh9.6k
Netherlands
Sukhdeep Singh9.6k wrote:

Try pBWA specifically made for that.

Cheers

ADD COMMENTlink written 6.7 years ago by Sukhdeep Singh9.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 852 users visited in the last hour