I would like to have a workflow with the following set up:
Step 1: creates an array of N files with specific naming conventions
Step 2: scatters over the output of step 1
#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow requirements: - class: ScatterFeatureRequirement inputs: input_file: File steps: step1: run: step1.cwl in: input_file: input_file out: [output_files] step2: run: step2.cwl scatter: input_file in: input_file: step1/output_files out: [output_files] outputs: final_out: type: File outputSource: step2/output_files
Where Step 1 is something like this, where the command is just a shell script that splits the files into 4 independent files, each with specific naming conventions:
#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool baseCommand: split_file.sh inputs: input_file: type: File inputBinding: position: 1 outputs: junctions: type: File outputBinding: glob: - $(inputs.input_file.basename).a.tmp - $(inputs.input_file.basename).b.txt - $(inputs.input_file.basename).c.fastq - $(inputs.input_file.basename).d.fasta
I know that I could just
glob: "*" to gather all these outputs, but I want to specifically check for the existence each output before moving onto Step 2. When I tried the above, it returned an empty array as output of Step 1, even though the script being called did produce each output in the temp directory. If I use secondaryFiles, it doesn't scatter across them. Is it currently possible to achieve something like this with CWL and what would be the best way? As a note, I cannot currently use ExpressionTool as it isn't supported by the runner we are using just yet.