GATK4 best practices: 1500 lines of code ... !?
1
1
Entering edit mode
3.7 years ago
Marvin ▴ 220

Hello,

I would like to call germline variants with GATK4, i.e. map with bwa mem, mark duplicates ecetera ... I haven't done that in a while but back then it was like a few commands. So now that I'm checking the pipeline commands, what I find is this: https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels/blob/master/PairedEndSingleSampleWf.wdl I'm not even sure whether this is the right file to look at? And I do not even know what a ".wdl" file is nor am I interested in learning yet another language. Why do I see 1500 lines of code, I was expecting just a few steps / GATK subcommand calls? Like HaplotypeCaller, filter ... ecetera ... what happened to those kind of tutorials?

Best regards

GATK4 • 1.9k views
ADD COMMENT
2
Entering edit mode

WDL is really realllllly realllllllllllllllllllllllly verbose. Most statements are used to copy variables to a process. The 'real' commands start ith command <<< and end with >>>

ADD REPLY
2
Entering edit mode

The pipeline has not changed too much since GATK3 (released 2014). The docs still exist, just not as easy to find as before.

This is not an obvious resource, but there is a lot of good info in their workshop presentations: https://drive.google.com/drive/folders/1y7q0gJ-ohNDhKG85UTRTwW1Jkq4HJ5M3

ADD REPLY
2
Entering edit mode

Thank you for your helpful replies. But I must say, GATK is very disappointing. Many many links lead to 404 pages and for example they seem to have renamed ApplyRecalibration to ApplyVQSR without a mention anywhere. Even on the ApplyVQSR page itself you can ctrl+f both ApplyRecalibration and ApplyVQSR. And on the best practices page for germline calling they still say that ApplyRecalibration is the "tool involved" when that tool is not even present in their tool index (I've also checked the tool index of earlier GATK4 versions). It's kind of as if the organisation of the docs was run by amateurs? I'm sorry to say that, but this is such a mess. Instead of trying to force their WDL language on everyone (which instead will lead to people using other tools I guess), they should focus on crawling their own pages for 404 errors and have a system in place that propagates changes properly throughout their docs (just search for "ApplyRecalibration" in your HTML files at least ... not that hard, find -exec grep is there for you). But thanks again to you guys for the very helpful answers/comments!

ADD REPLY
0
Entering edit mode

They recently moved their documentation and forums to a new platform, so a lot of the older pages got lost.

ADD REPLY
1
Entering edit mode

GATK tools is written by Broad Institute and Cromwell, which uses wdl files as input, is developed by Broad institute. Broad institute is pushing cromwell as workflow engine leveraging on other tools produced by Broad institute. (IMO).

You can refer to this gatk best practices variant calling workflow for RNAseq here: https://digibio.blogspot.com/2015/10/rna-seq-and-gatk-best-practices.html.

ADD REPLY
1
Entering edit mode
3.6 years ago
Barry Digby ★ 1.3k

This Github repository contains the code used to generate variant calling benchmarking in this paper.

https://github.com/bharani-lab/WES-pipelines/tree/master/Script

It takes you up to Haplotypecaller so should save some time sifting through WDL

You can substitute vcftools --remove-indels | --keep-only-indels with gatk SelectVariants before applying filtering steps.

ADD COMMENT

Login before adding your answer.

Traffic: 2144 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6