I've processing targeted resequencing data from a Haloplex panel run on a MiSeq using Agilents Surecall software and generated variant calls. However, I want to generate calls using my own pipeline in GATK to compare to Surecall.
I'm having trouble with the run time for bwa mem run through virtual box.
Virtualbox * note, huge difference between real time and CPU time?
[main] Real time: 21280.065 sec; CPU: 2547.937 sec
[main] Real time: 572.116 sec; CPU: 375.354 sec
Surecall uses bma version: 0.7.5a-r405 - Windows port version: 1.2 and the following command:
bwa.exe, mem, -M, -D, 0,0, -B, 4, -A, 1.0, -w, 100, -k, 19, -R, @RG\tID:X\tSM:X, -t, 4, hg19.fasta, R1_Cut.fastq, R2_Cut.fastq, >, X.sam
In virtualbox I'm using bwa Version: 0.7.12-r103 and the following command:
bwa mem -t 6 -M -R @RG\tID:X\tSM:X human_g1k_v37.fasta.gz R1_trimpaired.fastq.gz R2_trimpaired.fastq.gz > X.sam
As for virtualbox I'm using Biolinux 8 with 8 cores and 5Gb of memory dedicated to it. I'm thinking maybe 5Gb is not enough and it's using paging, which is causing the slow down? I might be able to increase the amount allocated slightly, but the computer itself only has 8Gb total and I know the host OS will need some. The only other difference is the -D parameter used by Surecall in bma 0.7.5. I dug through bwa's git and found:
"-D FLOAT drop chains shorter than FLOAT fraction of the longest overlapping chain [%.2f]\n", opt->drop_ratio);"
I did not see that in the bwa manual, so I did not specify it in my command (0 could be the default anyway as Surecall specified a few parameters with default values).