scatter method between two steps CWL
1
0
Entering edit mode
6.1 years ago
djarecka • 0

I had a workflow with a similar structure to this one:

steps:
  move:
    run: cwl_move.cwl
    in:
      script: script_mv
      input_file: file_orig
    out: [output_files_mv]
  edit:
    run: cwl_edit.cwl
    scatter: [input_file, script]
    scatterMethod: dotproduct
    in:
      script: script_edit
      input_file: move/output_files_mv
    out: [output_files]

So I had one scatter in the second step, input_file (or move/output_files_mv) and script (or script_edit) are arrays. But now I would like to add scatter to the first step:

steps:
  move:
    run: cwl_move.cwl
    scatter: [input_file]
    in:
      script: script_mv
      input_file: file_orig
    out: [output_files_mv]
 edit:
   ...

And would like to keep the old scatter in the second step for every input_file from the first step, i.e. I would like to have "cross product" between these two steps (ideally without even changing the second step). What would be the proper way of doing it in CWL? Anyone has any example that can share with me?

cwl scatter • 2.8k views
ADD COMMENT
0
Entering edit mode
5.9 years ago
biokcb ▴ 170

I know that this is a pretty old post at this point, but I wanted to add something in case someone is looking around online for an answer or even if @djarecka is still wondering about this.

If I understand what you want is to input an array of files to the workflow, where step1/move takes a single file input and produces an array of files that is scattered on in step two. If you scatter on the first step then the second step, you could feasibly get this to work, but without knowing what your other CWL scripts look like I can't say for sure. I think the best way to approach this is to use SubworkflowFeatureRequirement as in http://www.commonwl.org/user_guide/22-nested-workflows/ and create a nested workflow.

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
  - class: SubworkflowFeatureRequirement
  - class: ScatterFeatureRequirement

inputs:
   in_files: File[]
outputs:
   output_files:
     outputSource: subworkflow1/output_files

steps:
  subworkflow1:
    in: 
      in_file: in_files
    scatter: in_file
    out: [output_files]
    run:
      class: Workflow                                                            
      requirements: 
       - class: ScatterFeatureRequirement
      inputs:
        in_file: File
      outputs: 
        output_files:  
          outputSource: edit/output_files
      steps:
        move:
          run: cwl_move.cwl
          in:
            script: script_mv
            input_file: file_orig
          out: [output_files_mv]
        edit:
          run: cwl_edit.cwl
          scatter: [input_file, script]
          scatterMethod: dotproduct
          in:
            script: script_edit
            input_file: move/output_files_mv
          out: [output_files]

You can also make the portion after run: under subworkflow1 a separate file to make this easier to look at. Based on my limited knowledge of this tool, you may also want a way of organizing the outputs in the end, but I think this general setup should get you on the right path.

edit: added outputs field in the workflow

ADD COMMENT

Login before adding your answer.

Traffic: 2011 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6