Question: CWL: How to parse output of steps in workflow
1
gravatar for jeltje.van.baren
2.9 years ago by
California
jeltje.van.baren80 wrote:

I have a workflow that starts with untarring a file into a directory. In the second step, I need one file from that directory as parameter --genomeDir for my tool.

Currently this is solved like so:

Workflow snippet:

  star:
    run: ../tools/STAR.cwl
    in:
      index: tar/output
      fastq: [TUMOR_FASTQ_1, TUMOR_FASTQ_2]
    out: [output]

Tool snippet:

inputs:
  index:
    type: Directory

(...)

arguments:
  - valueFrom: $(inputs.index.path + "/ref_genome.fa.star.idx")
    position: 0
    prefix: --genomeDir

This works, but seems unnecessarily complex, and it also puts a prefix in the arguments. How do I pass the file directly from the workflow? I'm thinking something along the lines of

index: $((tar/output).path + "/ref_genome.fa.star.idx")

And while we're at it, what if I did want to parse this inside the tool cwl, how do I do it in the inputs?

cwl workflow • 1.6k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by jeltje.van.baren80
2
gravatar for Michael R. Crusoe
2.9 years ago by
Common Workflow Language project
Michael R. Crusoe1.8k wrote:

Hello jeltje.van.baren,

Thank you for your question. This is a valid need that isn't well met in CWL 1.0.

There are at least two other options which allow you to keep your STAR CWL description ignorant of the structure of your TAR archive (which I agree is a good idea):

  1. Have your untar step output specific files, not just a whole directory

  2. Use an expression tool to pull out the file you need, as demonstrated by Michael Kotliar in chat https://github.com/SciDAP/workflows/blob/master/expressiontools/get-file-by-name.cwl

I've made a proposal for some enhanced syntax to make this easier in future versions of CWL -- likely after the v1.1 release: https://github.com/common-workflow-language/common-workflow-language/issues/430

To answer your other question about not mixing arguments and inputs:

inputs:
  genomeDir:
    type: Directory
    inputBinding:
      valueFrom: $(self.path)/ref_genome.fa.star.idx
      position: 0  # FYI: if there is a prefix, a position is often unnecessary
      prefix: --genomeDir
ADD COMMENTlink written 2.9 years ago by Michael R. Crusoe1.8k
0
gravatar for jeltje.van.baren
2.9 years ago by
California
jeltje.van.baren80 wrote:

This last solution is perfect for my particular problem, thanks!

I fully agree on not untarring whole directories if you need a single file, but this is for a DREAM challenge and I'm only allowed one index.tar.gz for the full workflow. Other steps need other files.

Looking forward to the v1.1 release.

ADD COMMENTlink written 2.9 years ago by jeltje.van.baren80

Great to hear!

FYI: You can have multiple outputs in your untaring step that give names to each of your input files:

[…]

outputs:
  A:
    type: File
    outputBinding:
      glob: fileA.txt
  B:
    type: File
    outputBinding:
      glob: otherfile.csv
  C:
    type: File
    outputBinding:
      glob: important.txt
ADD REPLYlink written 2.9 years ago by Michael R. Crusoe1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1532 users visited in the last hour