Question: How to speed up bwa mem?
gravatar for bluemonster0808
2.4 years ago by
bluemonster080840 wrote:

I have 2 pair-end fastq files, the size is 200GB * 2. Doing "bwa mem" cost me nearly 6 hours on a pretty good machine(24 physical core E5-2670 v3, Hyper-Threading, 64GB memory).

There is some problem with the "-t" param . The timecost of "-t 48" and "-t 12" are nearly the same.

I wonder if it's possible to split fastq file into multi parts,and run "bwa mem" seperately and concurrently? Then combine the sam outputs toghter.

bwa wgs • 3.3k views
ADD COMMENTlink modified 2.4 years ago by Brian Bushnell17k • written 2.4 years ago by bluemonster080840

6 hours is pretty awesome in my experience for a file of that size (if the size is the compressed fq, is it?). At some point, increasing -t is probably not beneficial as I/O limitations kick in. You can of course split your fastq into pieces, then later use e.g. SAMtools cat piped into SAMtools sort, but that will in the end take probably longer than just waiting these 6h.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by ATpoint28k

200GB*2 is plain fastq, not compressed.

thank you for your answer, I'll try to split it, and figure out the timecost ^_^

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by bluemonster080840

Try something other than bwa? minimap2 was released recently and is supposed to be incredibly fast.

ADD REPLYlink written 2.4 years ago by Joe16k

I am also curious about bwa mem mapping rate. Rather than the size of the file, I would specify the number of reads. I subsampled a 150bp paired fastq pair to 1 mil reads and mapped it to the zebrafish genome (half the size of human genome). I used a computing cluster, using 16 cores (each with 8 gb ram , but barely 8 gb was actually used). Mapping rate is also affected by read quality. I have trimmed reads with all bases having phred quality >28.

bwa mem mapping: 1 mil reads in 73 sec (just pure mapping). mapping+samblaster+samtools fixmate+samtools sort (for variant calling workflow): 1 mil reads in 325 sec

For pure mapping, that is mapping around 13600 reads per second. So 200mil reads would take 4 hours. For full workflow, mapping is 3000 reads per sec. So 200mil reads would take 18 hours. (These are just rough estimated values. It is not a simple linear interpolation in practice.)

I wonder there are any sources to compare mapping rate (reads mapped per sec or min) on various system specs.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by rmf770
gravatar for Brian Bushnell
2.4 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

If you can't increase speed by using more than 12 threads on a 24-core node, you are probably I/O limited, in which case no alternative aligner could run faster (unless you are write-limited due to unnecessary fields in the sam output, which you could potentially disable). You can prevent such I/O limitations by keeping your files compressed at all times (for example, gzipped via pigz), and read/write compressed files at every stage of your pipeline. If you run "top" while mapping is running, you will see how much CPU utilization you have; it should be around 4800% while mapping with 48 threads. Hyperthreading does not particularly increase the speed of mapping, though; it's more for floating-point operations, so there is likely no point in exceeding 24 threads anyway.

If you have multiple disks or filesystems, you may be able to increase speed by reading from one disk and writing to another.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour