Question

CWL decision making/loop-back

0

Entering edit mode

6.9 years ago

d.white • 0

Hi everyone. I’m looking at describing a workflow for (initially) checking the quality of genome sequences. I have written a series of Python scripts that do things like making decisions on parts of the workflow to run, calling command line tools, passing files to the next stages, etc, that would appear to be easier to do in CWL. However, I’m having some trouble with figuring out how to do the non-basic stuff. The User Guide gives nice examples of individual stages, and a simple workflow, but it’s hard to figure out all of the options that are possible.

For example, I think the scatter option will allow me to provide a series of input files, and it will perform the same steps in a workflows on each file. Later on, I want to then check ALL of the outputs to make sure they pass sanity checks before moving on to later stages (not yet defined). Do you have a workflow example that describes this? It’s not overly clear how it works exactly. I also have a step that might need repeating, depending on the output of later stages. In my example, FastQC is run on a file, and the output is checked for warn/fail flags depending on the defined tolerances. Some files may need to go through a quality trim stage, and then go back through the FastQC and checking stages, with second runthroughs deciding if a borderline original file is now okay to pass through. Is that kind of automated decision making possible in CWL?

cwl • 1.3k views

ADD COMMENT • link updated 6.9 years ago by Michael R. Crusoe ★ 1.9k • written 6.9 years ago by d.white • 0

score 0 · Answer 1 · 2017-06-06

0

Entering edit mode

6.9 years ago

Michael R. Crusoe ★ 1.9k

Hello d.white, thank you for your question,

You can add a step in your CWL workflow that evaluate its inputs and fails if they don't meet your specifications -- this could be a script that exits with a non zero return code, or an ExpressionTool that throws a Javascript exception.

Many types of automated decision making would require workflow conditionals, something not present in CWL v1.0. I recommend breaking your workflow into parts and making decisions about what happens next outside of CWL. You can simplify this by directly consuming the JSON CWL output object and producing the JSON CWL input object for the next sub-workflow within your external program/script.

I hope this was helpful!

ADD COMMENT • link 6.9 years ago by Michael R. Crusoe ★ 1.9k

0

Entering edit mode

Hi Michael,

Thanks for the reply. In the case that tolerances fail, your solution would be fine. However, there are "maybe" tolerances, where a stage of trimming is applied, and then needs to go back through the previous stages. I shall have to have a think whether moving to CWL is worthwhile for the use case I currently have. In theory it should make it easier for users to alter the workflow to their needs, but the amount of work put in to get there might be too much given this is simply a pilot use case. I'm currently using cwltool to run the workflows, and just trying to figure out how to define the workflows in CWL. As things currently stand, I have non-CWL, python-only scripts that works as expected, and maybe the workflow is too complex to move over without too much additional work in a finite amount of project time left. I will get back to you if I figure it out.

Cheers

Darren

ADD REPLY • link 6.9 years ago by d.white • 0