Question: CWL rerunning completed output
3
gravatar for bruce.moran
2.6 years ago by
bruce.moran680
Ireland
bruce.moran680 wrote:

I like CWL, and would like to use it in a production setting. I am coming from a Make approach, whereby output is not remade unless datestamps of input determine this to be necessary. I cannot find any documentation on this aspect for CWL. I have read the 'Gentle Intro' twice end-to-end thinking this was an oversight of mine. Can someone direct me to docs if extant, or give insight if not? Googling doesn't come up with anything either. I would like to avoid going all-in on Make, it's not particularly user-friendly.

Many thanks,

Bruce.

cwl common workflow language • 1.1k views
ADD COMMENTlink modified 2.6 years ago by well30970 • written 2.6 years ago by bruce.moran680

yes reentrancy is one of those things the make crowd takes for granted but the workbench people find especially challenging to implement

ADD REPLYlink written 2.6 years ago by Jeremy Leipzig19k

Only have a few months using Make so was assessing the alternatives. I read your paper on this, so appreciate your experience. Do you have anything written on your own preferred solution, especially for production?

ADD REPLYlink written 2.6 years ago by bruce.moran680

though I've never tested CWL with either of these tools - both Toil https://toil.readthedocs.io/en/3.10.0/running/cwl.html and Nextflow https://github.com/nextflow-io/cwl2nxf offer some support for CWL and are reentrant

ADD REPLYlink written 2.6 years ago by Jeremy Leipzig19k
2
gravatar for well309
2.6 years ago by
well30970
well30970 wrote:

Hi Bruce,

I also did not find any official documentation about caching system. I have been using CWL for processing dozens of genomic files with the cwl-runner software (https://github.com/common-workflow-language/cwltool). It has a working caching system. Here is how I run this software in production setting:

cwl-runner --tmpdir-prefix tmp/ --cachedir cache/ --outdir results/ workflow.cwl job.yml
  • cwl-runner - CWL reference implementation executor
  • --tmpdir-prefix tmp/ - Write temporary files at tmp directory at working directory instead of using /tmp directory.
  • --cachedir cache/ - Write cache files at cache directory at working directory. It will be used when rerun the same command line, reducing processing time for already processed files. There are some known issues (https://github.com/common-workflow-language/cwltool/issues/493).
  • --outdir results/ - Write result files at results directory at working directory.
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by well30970

OK, very interesting. Your issue on Github is only a few months old, and caching is described as 'experimental' in the response, so I don't think this is going to be useful for production work I will be doing. Hopefully this will be part of next updates. Many thanks for the input, and the example.

ADD REPLYlink written 2.6 years ago by bruce.moran680
2

Hello bruce.moran,

The CWL reference implementation cwltool is not designed for production usage. For that, check out one of the other implementations: http://www.commonwl.org/#Implementations

FYI: cwl-runner is the generic way to run any CWL executor, it could refer to cwltool, arvados, rabix bunny, cwltoil, or others :-)

Cheers,

ADD REPLYlink written 2.6 years ago by Michael R. Crusoe1.8k
1

Thanks Michael, appreciate your input.

ADD REPLYlink written 2.6 years ago by bruce.moran680
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 882 users visited in the last hour