Question: Gatk - Haplotypecaller Is So Slow, What Is Faster And As Good?
gravatar for newDNASeqer
4.9 years ago by
United States
newDNASeqer590 wrote:

I have 15 exome-seq samples, and have been using BWA-PiCard-GATK pipeline to do the variant calling. I did not realize GATK is so slow until I have to analyze this large number of samples. In this HaplotypeCaller step, each sample seems to take at least 2 days (48+ hours). Is this normal is there's something I did wrong? Below is my command, is there anything wrong or GATK-HaplotypeCaller is known this slooooow?

java -Xmx10g -Djava.awt.headless=true -jar /Library/Java/Extensions/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -minPruning 3 \ -dcov 10 \ -R ./GATK_ref/hg19.fasta \ -I ./GATK/BQSR/sample1_realign.recal.compressed.bam \ -o ./GATK/VQSR/sample1_realign.raw.snps_indels.vcf

What other variant calling program do you guys recommend? Can I use the above sample1_realign.recal.compressed.bam file (prepared by GATK procedures before HaplotypeCaller) for use with the program you recommend? Thank you

ps: GATK 2.5 is what I am using.

gatk • 9.8k views
ADD COMMENTlink modified 21 months ago by daniel30 • written 4.9 years ago by newDNASeqer590

No other variant caller tool is as "good" as GATK HaplotypeCaller (written 2014-05-21) that I know. If someone finds a better tool, please reply here.

ADD REPLYlink written 4.2 years ago by 141341254653464453.4k
gravatar for zam.iqbal.genome
4.9 years ago by
United Kingdom
zam.iqbal.genome1.6k wrote:

There are in fact three other local de novo variant callers

  • -Platypus (from Andy Rimmer and Gerton Lunter in Oxford)
  • SGA (from Jared Simpson at the Sanger Institute and now OICR)//
  • DISCOVAR, from David Jaffe's team at the Broad. //

None of these, nor the GATK Haplotype Caller, have yet published a paper describing their methods or performance, but I've heard good things of all 4 (Platypus, SGA and Haplotype Caller have been heavily tested and used in the 1000 Genomes Project), and believe papers are in progress.

There are also two global de novo variant callers,

  • Cortex (from me amongst others), published last year: De novo assembly and genotyping of variants using colored de Bruijn graphs. Z Iqbal, M Caccamo, I Turner, P Flicek, G McVean, Nature Genetics (2012)

  • Fermi from Heng Li, also published last year Exploring
    single-sample SNP and INDEL calling with whole-genome de novo
    assembly Heng Li, Bioinformatics

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by zam.iqbal.genome1.6k

The Platypus paper is out:

ADD REPLYlink written 3.7 years ago by Leonor Palmeira3.6k

The Fermi paper is also out:

ADD REPLYlink written 2.5 years ago by Rob Syme530

Has GATK published a paper yet?

ADD REPLYlink written 22 months ago by SmallChess460

You can find more info here:

ADD REPLYlink written 20 months ago by Leandro Lima890
gravatar for William
4.9 years ago by
William4.2k wrote:

GATK haplotype caller is kind of slow because it does variant calling based on a sliding window local denovo assembly. That is also were the advantages of the haplotype caller com from. I don't know any other local denovo assembly based variant callers.

To make it run faster you can run it on a machine with a large number of cores or on a Sun Grid Engine cluster. You can use the GATK queue library together with a small scala script to start the haplotype caller on multiple cores, locally or on a cluster. The last version of GATK (2.6.5 ) is also much faster, but you need Java 1.7 to run that version of GATK.

If you don't want to do this or it is still to slow (what can happen with multi-sample calling on a large number of samples.) you can use the "old" GATK Unified Genotyper. It is much faster but lacks the advantages of doing a local denovo assembly.

ADD COMMENTlink written 4.9 years ago by William4.2k

As Zam notes, there are a number of other methods for local denovo other than the GATK. Some of them have distinct advantages, and as I understand the method in the GATK is not exactly a haplotype caller in the sense that it only uses the windowed local assembly to generate candidate alleles. Haplotypes are then inferred post-hoc where linkage disequilibrium is greater than 0.95.

ADD REPLYlink written 4.8 years ago by Erik Garrison2.1k

Erik - why does that mean the GATK HC is not exactly a haplotype caller??

ADD REPLYlink written 4.8 years ago by zam.iqbal.genome1.6k

It ultimately calls and reports point mutations, not haplotypes. The haplotype-based aspect of detection is driven by the debruijn assembly which is used to detect possible alleles.

ADD REPLYlink written 4.7 years ago by Erik Garrison2.1k
gravatar for daniel
21 months ago by
United Kingdom
daniel30 wrote:

To expand Zam's answer, we have just released an alpha version of Platypus' successor, octopus. By default octopus isn't much faster than GATK, but it does have an optional fast mode which gives similar runtimes to Platypus with little loss in calling accuracy. It also has built in multithreading.

ADD COMMENTlink written 21 months ago by daniel30

Link is dead. Any word on the octopus project?

ADD REPLYlink written 16 months ago by Ben Fulton50

Octopus is back online - the link should now work.

ADD REPLYlink written 8 months ago by daniel30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1410 users visited in the last hour