Question: Bash vs WDL for running GATK
0
gravatar for Mehulsharma.253
24 months ago by
Mehulsharma.25310 wrote:

I am currently running a brief variant calling project. I was running GATK using standard tool commands in a Linux shell when I came across a blog on Workflow Description Language (WDL) scripting for GATK which runs on cromwell.

What would be the primary differences in running GATK (4.0.11) using either one of them ? I read that WDL is an "analysis pipieline" but haven't got much of an idea as to what that means in terms of bioinformatics.

wdl linux pipeline cromwell gatk • 1.5k views
ADD COMMENTlink modified 24 months ago by vdauwera970 • written 24 months ago by Mehulsharma.25310
3
gravatar for vdauwera
24 months ago by
vdauwera970
Cambridge, MA
vdauwera970 wrote:

WDL is intended to help you automate the work by chaining together commands, so you don't have to do it manually. The result is called a workflow or pipeline. There are other languages and systems besides WDL that allow you to do this. The main advantage of WDL is that it is what the GATK team uses so you can find pre-written workflows on Github for all the major use cases supported by GATK (see https://github.com/gatk-workflows/ ). You can use the GATK WDL scripts right out of the box or you can modify them to suit your own project, and so you don't have to do as much work. Also, WDL is quite user-friendly so it is suitable for someone who has not written workflows before.

ADD COMMENTlink modified 24 months ago by genomax92k • written 24 months ago by vdauwera970

Thank you ! Pardon me if the question sounds dumb but what would be the difference vis a vis just piping commands together in a bash shell using the standard tools-specific commands ? What would be the primary advantage of a WDL script / Cromwell engine in that aspect ? Does it streamline my processes more efficiently and work faster ?

I'm essentially looking for a way to streamline and combine as many steps as possible (since I have 100+ paired samples to align and call) in a fast manner.

ADD REPLYlink written 23 months ago by Mehulsharma.25310
1

What would be the primary advantage of a WDL script / Cromwell engine in that aspect ?

parallelization, don't re-make existing files on failure, etc...

ADD REPLYlink written 23 months ago by Pierre Lindenbaum131k
2
gravatar for Pierre Lindenbaum
24 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

https://academic.oup.com/bib/article/18/3/530/2562749

Bioinformatic analyses invariably involve shepherding files through a series of transformations, called a pipeline or a workflow. Typically, these transformations are done by third-party executable command line software written for Unix-compatible operating systems. The advent of next-generation sequencing (NGS), in which millions of short DNA sequences are used as the source input for interpreting a range of biological phenomena, has intensified the need for robust pipelines. NGS analyses tend to involve steps such as sequence alignment and genomic annotation that are both time-intensive and parameter-heavy.

ADD COMMENTlink written 24 months ago by Pierre Lindenbaum131k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1327 users visited in the last hour