Question: How to make variant calling run faster?
0
gravatar for steve
17 months ago by
steve1.9k
United States
steve1.9k wrote:

We are doing a lot of variant calling using GATK MuTect2. Evidently, multi-threading does not work in this program, so we have to run it single threaded. It takes a very long time, ~24-36 hours to finish. Anyone know how to speed it up?

I had an idea of splitting the .bam into separate files per-chromosome and then running them all separately in parrallel, but I was not sure how difficult it would be and how hard it would be to merge everything back together into a single .vcf output at the end. Any thoughts?

mutect2 gatk variant calling • 1.9k views
ADD COMMENTlink modified 17 months ago by Brian Bushnell16k • written 17 months ago by steve1.9k

looks like my second part there is already mentioned here

ADD REPLYlink written 17 months ago by steve1.9k
2
gravatar for jared.andrews07
17 months ago by
jared.andrews071.9k
St. Louis, MO
jared.andrews071.9k wrote:

Your second option is the right one. Breaking up BAMs by chromosome is easy as seen in your linked question. Concatenating the VCFs is also pretty easy with bcftools. Just look for the concat command.

ADD COMMENTlink written 17 months ago by jared.andrews071.9k

I ended up doing the reverse, and splitting my .bed file with target regions into one .bed per chromosome, and then running the same original .bam files against each of the split .bed files in parrallel. So far it has reduced 30+ hour variant calling down to ~4 hours as per the longest-running chromosome (most finished <2 hours). Helpful details here

ADD REPLYlink modified 17 months ago • written 17 months ago by steve1.9k
0
gravatar for pfs
17 months ago by
pfs250
USA/Boston
pfs250 wrote:

If speed is your main concern and tool development is not an option I would change my pipeline to use Samtools for variant calling. I have benchmark variant calling tools in the past and Samtools is a lot faster than GATK haplotypecaller v3.5. I can not speak for newer versions of GATK.

ADD COMMENTlink modified 17 months ago • written 17 months ago by pfs250
1

It's also a lot less accurate by many other benchmarks. I would not recommend samtools for variant calling, particularly if you care about indels or low frequency variants. It will not detect variants with a VAF <0.20. It definitely is quicker than GATK, but it's not as sensitive or accurate.

See this paper for benchmarks of many different callers (though there are many other papers out there like this as well).

ADD REPLYlink written 17 months ago by jared.andrews071.9k
0
gravatar for Brian Bushnell
17 months ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

You can make variant-calling much faster by using BBMap's callvariants.sh tool (which is fully multithreaded) on a raw, unsorted sam file, so you don't have to waste time sorting, indexing and compressing/decompressing bam. It works on bam files too, though, so if you want to use a bam as input I suggest you have use samtools 1.4 or greater in the command line so that the decompression will also be multithreaded. callvariants.sh is around 90x faster than the samtools mpileup pipeline, in my testing

ADD COMMENTlink modified 17 months ago • written 17 months ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1733 users visited in the last hour