Question: CWL rerunning completed output
3
gravatar for bruce.moran
22 months ago by
bruce.moran620
Ireland
bruce.moran620 wrote:

I like CWL, and would like to use it in a production setting. I am coming from a Make approach, whereby output is not remade unless datestamps of input determine this to be necessary. I cannot find any documentation on this aspect for CWL. I have read the 'Gentle Intro' twice end-to-end thinking this was an oversight of mine. Can someone direct me to docs if extant, or give insight if not? Googling doesn't come up with anything either. I would like to avoid going all-in on Make, it's not particularly user-friendly.

Many thanks,

Bruce.

ADD COMMENTlink modified 22 months ago by well30970 • written 22 months ago by bruce.moran620

yes reentrancy is one of those things the make crowd takes for granted but the workbench people find especially challenging to implement

ADD REPLYlink written 22 months ago by Jeremy Leipzig18k

Only have a few months using Make so was assessing the alternatives. I read your paper on this, so appreciate your experience. Do you have anything written on your own preferred solution, especially for production?

ADD REPLYlink written 22 months ago by bruce.moran620

though I've never tested CWL with either of these tools - both Toil https://toil.readthedocs.io/en/3.10.0/running/cwl.html and Nextflow https://github.com/nextflow-io/cwl2nxf offer some support for CWL and are reentrant

ADD REPLYlink written 22 months ago by Jeremy Leipzig18k
2
gravatar for well309
22 months ago by
well30970
well30970 wrote:

Hi Bruce,

I also did not find any official documentation about caching system. I have been using CWL for processing dozens of genomic files with the cwl-runner software (https://github.com/common-workflow-language/cwltool). It has a working caching system. Here is how I run this software in production setting:

cwl-runner --tmpdir-prefix tmp/ --cachedir cache/ --outdir results/ workflow.cwl job.yml
  • cwl-runner - CWL reference implementation executor
  • --tmpdir-prefix tmp/ - Write temporary files at tmp directory at working directory instead of using /tmp directory.
  • --cachedir cache/ - Write cache files at cache directory at working directory. It will be used when rerun the same command line, reducing processing time for already processed files. There are some known issues (https://github.com/common-workflow-language/cwltool/issues/493).
  • --outdir results/ - Write result files at results directory at working directory.
ADD COMMENTlink modified 22 months ago • written 22 months ago by well30970

OK, very interesting. Your issue on Github is only a few months old, and caching is described as 'experimental' in the response, so I don't think this is going to be useful for production work I will be doing. Hopefully this will be part of next updates. Many thanks for the input, and the example.

ADD REPLYlink written 22 months ago by bruce.moran620
2

Hello bruce.moran,

The CWL reference implementation cwltool is not designed for production usage. For that, check out one of the other implementations: http://www.commonwl.org/#Implementations

FYI: cwl-runner is the generic way to run any CWL executor, it could refer to cwltool, arvados, rabix bunny, cwltoil, or others :-)

Cheers,

ADD REPLYlink written 22 months ago by Michael R. Crusoe1.5k
1

Thanks Michael, appreciate your input.

ADD REPLYlink written 22 months ago by bruce.moran620
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 791 users visited in the last hour