CWL rerunning completed output
1
3
Entering edit mode
7.2 years ago
bruce.moran ▴ 970

I like CWL, and would like to use it in a production setting. I am coming from a Make approach, whereby output is not remade unless datestamps of input determine this to be necessary. I cannot find any documentation on this aspect for CWL. I have read the 'Gentle Intro' twice end-to-end thinking this was an oversight of mine. Can someone direct me to docs if extant, or give insight if not? Googling doesn't come up with anything either. I would like to avoid going all-in on Make, it's not particularly user-friendly.

Many thanks,
Bruce.

Common-Workflow-Language CWL • 2.8k views
ADD COMMENT
0
Entering edit mode

yes reentrancy is one of those things the make crowd takes for granted but the workbench people find especially challenging to implement

ADD REPLY
0
Entering edit mode

Only have a few months using Make so was assessing the alternatives. I read your paper on this, so appreciate your experience. Do you have anything written on your own preferred solution, especially for production?

ADD REPLY
0
Entering edit mode

though I've never tested CWL with either of these tools - both Toil https://toil.readthedocs.io/en/3.10.0/running/cwl.html and Nextflow https://github.com/nextflow-io/cwl2nxf offer some support for CWL and are reentrant

ADD REPLY
2
Entering edit mode
7.2 years ago
well309 ▴ 70

Hi Bruce,

I also did not find any official documentation about caching system. I have been using CWL for processing dozens of genomic files with the cwl-runner software (https://github.com/common-workflow-language/cwltool). It has a working caching system. Here is how I run this software in production setting:

cwl-runner --tmpdir-prefix tmp/ --cachedir cache/ --outdir results/ workflow.cwl job.yml
  • cwl-runner - CWL reference implementation executor
  • --tmpdir-prefix tmp/ - Write temporary files at tmp directory at working directory instead of using /tmp directory.
  • --cachedir cache/ - Write cache files at cache directory at working directory. It will be used when rerun the same command line, reducing processing time for already processed files. There are some known issues (https://github.com/common-workflow-language/cwltool/issues/493).
  • --outdir results/ - Write result files at results directory at working directory.
ADD COMMENT
0
Entering edit mode

OK, very interesting. Your issue on Github is only a few months old, and caching is described as 'experimental' in the response, so I don't think this is going to be useful for production work I will be doing. Hopefully this will be part of next updates. Many thanks for the input, and the example.

ADD REPLY
2
Entering edit mode

Hello bruce.moran,

The CWL reference implementation cwltool is not designed for production usage. For that, check out one of the other implementations: http://www.commonwl.org/#Implementations

FYI: cwl-runner is the generic way to run any CWL executor, it could refer to cwltool, arvados, rabix bunny, cwltoil, or others :-)

Cheers,

ADD REPLY
1
Entering edit mode

Thanks Michael, appreciate your input.

ADD REPLY

Login before adding your answer.

Traffic: 1361 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6