Question: BWA mem alignment taking a while in a virtual machine
0
gravatar for Robert Sicko
4.7 years ago by
Robert Sicko580
United States
Robert Sicko580 wrote:

I've processing targeted resequencing data from a Haloplex panel run on a MiSeq using Agilents Surecall software and generated variant calls. However, I want to generate calls using my own pipeline in GATK to compare to Surecall.

I'm having trouble with the run time for bwa mem run through virtual box.

Virtualbox * note, huge difference between real time and CPU time?

[main] Real time: 21280.065 sec; CPU: 2547.937 sec

Surecall

[main] Real time: 572.116 sec; CPU: 375.354 sec

Surecall uses bma version: 0.7.5a-r405 - Windows port version: 1.2 and the following command:

bwa.exe, mem, -M, -D, 0,0, -B, 4, -A, 1.0, -w, 100, -k, 19, -R, @RG\tID:X\tSM:X, -t, 4, hg19.fasta, R1_Cut.fastq, R2_Cut.fastq, >, X.sam

In virtualbox I'm using bwa Version: 0.7.12-r103 and the following command:

bwa mem -t 6 -M -R @RG\tID:X\tSM:X human_g1k_v37.fasta.gz R1_trimpaired.fastq.gz R2_trimpaired.fastq.gz > X.sam

As for virtualbox I'm using Biolinux 8 with 8 cores and 5Gb of memory dedicated to it. I'm thinking maybe 5Gb is not enough and it's using paging, which is causing the slow down? I might be able to increase the amount allocated slightly, but the computer itself only has 8Gb total and I know the host OS will need some. The only other difference is the -D parameter used by Surecall in bma 0.7.5. I dug through bwa's git and found:
​"-D FLOAT drop chains shorter than FLOAT fraction of the longest overlapping chain [%.2f]\n", opt->drop_ratio);"

I did not see that in the bwa manual, so I did not specify it in my command (0 could be the default anyway as Surecall specified a few parameters with default values).

 

 

 

 

 

bwa alignment next-gen • 2.0k views
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Robert Sicko580
1

For human, 5.5G is the minimum. Better 6GB.

ADD REPLYlink written 4.7 years ago by lh331k

Thanks Heng! After running with 6GB allocated to the virtual machine the run times are much better.

[main] Real time: 231.878 sec; CPU: 741.689 sec

Any chance you could elaborate on what the Surecall parameter "-D, 0,0" is doing? Is this the default?

 

ADD REPLYlink written 4.7 years ago by Robert Sicko580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 763 users visited in the last hour