Question: GATK4 local computing workflow
0
gravatar for moxu
2.6 years ago by
moxu470
moxu470 wrote:

We have a large sample size for WGS and variant calling, and now we are assessing different pipeline options. One of the pipelines we are investigating is the GATK4 pipeline. Broad provided a workflow defined in .WDL + JSON, but it uses cloud computing (reference files in the google storage cloud). This does not work for us because we are not allowed to access cloud in any sense, for privacy and security reasons. Besides, it's hard to run the google storage cloud computing pipeline -- I always get some errors related to google storage.

The above being said, I am wondering if anyone has run the GATK4 pipeline using only local files with the workflow recommended by Broad, and is willing to share the workflow (in .wdl or .cwl or something similar).

Your help would be highly appreciated!

(p.s. This is a not-for-profit project which would greatly benefit the research community and general public, so your contribution would be maximized because you are contributing to the human kind, not just a small group of people)

ADD COMMENTlink modified 2.6 years ago by vdauwera970 • written 2.6 years ago by moxu470
4
gravatar for vdauwera
2.6 years ago by
vdauwera970
Cambridge, MA
vdauwera970 wrote:

Assuming you're talking about the germline short variants discovery pipeline, we have several different versions; aside from the cloud-optimized pipeline we also have one that is optimized for local execution. They are summarized here: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145 and you can find the WDL for the local-optimized version here: https://github.com/gatk-workflows/intel-gatk3-4-germline-snps-indels. Note also that the pipelines listed as "universal" can be run anywhere; you just need to download the files and update the paths accordingly.

We'd be happy to help you further over on the GATK support forum: https://gatkforums.broadinstitute.org/gatk

ADD COMMENTlink written 2.6 years ago by vdauwera970

You are the best!

I did post my questions to the GATK forum, but nobody answers me recently.

A quick question: what are 2T, 56T, 20k, HDD, on-prem, throughputs, FPGA?

A 2 cent suggestion: It would be nice to have a choice to automatically download all auxiliary datasets to designated directories as defined in the WDL/json file, or you can pre-bundle everything.

Thanks so much!

ADD REPLYlink written 2.6 years ago by moxu470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2141 users visited in the last hour