GATK4 best practices: 1500 lines of code ... !?
1
1
Entering edit mode
14 months ago
Marvin ▴ 190

Hello,

I would like to call germline variants with GATK4, i.e. map with bwa mem, mark duplicates ecetera ... I haven't done that in a while but back then it was like a few commands. So now that I'm checking the pipeline commands, what I find is this: https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels/blob/master/PairedEndSingleSampleWf.wdl I'm not even sure whether this is the right file to look at? And I do not even know what a ".wdl" file is nor am I interested in learning yet another language. Why do I see 1500 lines of code, I was expecting just a few steps / GATK subcommand calls? Like HaplotypeCaller, filter ... ecetera ... what happened to those kind of tutorials?

Best regards

GATK4 Tool • 943 views
2
Entering edit mode

WDL is really realllllly realllllllllllllllllllllllly verbose. Most statements are used to copy variables to a process. The 'real' commands start ith command <<< and end with >>>

2
Entering edit mode

The pipeline has not changed too much since GATK3 (released 2014). The docs still exist, just not as easy to find as before.

This is not an obvious resource, but there is a lot of good info in their workshop presentations: https://drive.google.com/drive/folders/1y7q0gJ-ohNDhKG85UTRTwW1Jkq4HJ5M3

2
Entering edit mode

0
Entering edit mode

They recently moved their documentation and forums to a new platform, so a lot of the older pages got lost.

1
Entering edit mode

GATK tools is written by Broad Institute and Cromwell, which uses wdl files as input, is developed by Broad institute. Broad institute is pushing cromwell as workflow engine leveraging on other tools produced by Broad institute. (IMO).

You can refer to this gatk best practices variant calling workflow for RNAseq here: https://digibio.blogspot.com/2015/10/rna-seq-and-gatk-best-practices.html.

1
Entering edit mode
14 months ago
Barry Digby ▴ 780

This Github repository contains the code used to generate variant calling benchmarking in this paper.

https://github.com/bharani-lab/WES-pipelines/tree/master/Script

It takes you up to Haplotypecaller so should save some time sifting through WDL

You can substitute vcftools --remove-indels | --keep-only-indels with gatk SelectVariants before applying filtering steps.