Question: Simple(?) workflow, can't sort out what output goes where
2
gravatar for starkruzr
2.4 years ago by
starkruzr20
starkruzr20 wrote:

Hi folks,

I am trying to feed simple results from Muscle into RaxML for processing. The use case here is for cwl-runner to be used as a workflow engine for the Airavata science gateway software. We'll have Airavata run, "cwl-runner muscle-raxml.cwl --infile file.fa [--diags] --model BINGAMMA" (for example) and then have it return the result. The problem right now is that cwl-runner doesn't understand my outputs, which makes sense, because I don't really understand how CWL keeps track of outputs either! When I run it right now, I get the following:

Fornacis:science-gateway-experiment-code jtd$ cwl-runner muscle-raxml.cwl --infile unaligned.fa --diags --model BINGAMMA
/usr/local/bin/cwl-runner 1.0.20161128202906
Resolved 'muscle-raxml.cwl' to 'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl'
Tool definition failed validation:
While checking field `outputs`
  While checking object `file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#classout`
    Field `outputSource` contains undefined reference to `raxmloutput`, tried [u'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#classout/raxmloutput', u'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#raxmloutput']
While checking field `steps`
  While checking object `file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#raxml`
    While checking field `in`
      While checking object `file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#raxml/raxmlinfile`
        Field `source` contains undefined reference to `intermediatefile`, tried [u'file:///Users/jtd/science-gateway-experiment-code/muscle-raxml.cwl#intermediatefile']

The idea is for Muscle to generate a file called "intermediatefile" which is then fed into RaxML for processing. RaxML then produces several files, which because of the arguments we provided will terminate all of them with ".out". Sounds sort of logical, but doesn't actually work.

Here's the contents of my three CWL files.

muscle-raxml.cwl:

cwlVersion: v1.0
class: Workflow
inputs:
  infile: File
  diags: boolean
  model: string

outputs:
  classout:
    type: File
    outputSource: raxmloutput

steps:
  muscle:
    run: muscleraxml-muscle.cwl
    in:
      muscleinfile: infile
      diagsflag: diags
    out: [intermediatefile]

  raxml:
    run: muscleraxml-raxml.cwl
    in:
      raxmlinfile: intermediatefile
      raxml_model: model
    out: [raxmloutput]

muscleraxml-muscle.cwl:

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [muscle]
arguments: ["-out intermediatefile"]
inputs:
  muscleinfile:
    type: File
    inputBinding:
      position: 1
      prefix: -in
  diagsflag:
    type: boolean
    inputBinding:
      position: 2
      prefix: -diags

outputs:
  intermediatefile:
    type: File
    outputBinding:
      glob: intermediatefile

muscleraxml-raxml.cwl:

cwlVersion: v1.0
class: CommandLineTool
label: RaxML wrapper
baseCommand: raxml
arguments: ["-n out -T 2"]
inputs:
  raxmlinfile:
    type: File
    inputBinding:
      position: 1
      prefix: -s
  raxml_model:
    type: string
    inputBinding:
      position: 2
      prefix: -m

outputs:
  raxmloutput:
    type: File
    outputBinding:
      glob: "*.out"

Help?

Thanks!

cwl • 1.1k views
ADD COMMENTlink modified 24 months ago by Biostar ♦♦ 20 • written 2.4 years ago by starkruzr20
5
gravatar for alaindomissy
2.4 years ago by
alaindomissy160
alaindomissy160 wrote:

In your workflow level file (muscle-raxml.cwl):

  • raxmloutput would refer to a workflow level input (which does not exist)
  • intermediatefile would refer to a workflow level input (which does not exist)

You need to specify the workflow step where these outputs come from:

  • instead of outputSource: raxmloutput you need outputSource: raxml/raxmloutput
  • instead of raxmlinfile: intermediatefile you need raxmlinfile: muscle/intermediatefile

ALSO: your example "cwl-runner muscle-raxml.cwl --infile file.fa [--diags] --model BINGAMMA" indicates that --flags is optional. So you need to make the corresponding input optionnals at the workflow level (muscle-raxml.cwl):

  • instead of diags: boolean you need diags: boolean?

as well as the tool level (muscleraxml-muscle.cwl):

  • instead of: diagsflag: type: boolean you need: diagsflag: type: boolean?

ALSO: the arguments field in muscleraxml-muscle.cwl needs to be a list of 2 strings instead of just one :

  • instead of: ["-out intermediatefile"] you need: ["-out", "intermediatefile"]

Here's the contents of the modified CWL files.

muscle-raxml.cwl:

cwlVersion: v1.0 
class: Workflow
inputs:
  infile: File
  diags: boolean?
  model: string
outputs:
  classout:
    type: File
    outputSource: raxml/raxmloutput
steps:
  muscle:
    run: muscleraxml-muscle.cwl
    in:
      muscleinfile: infile
      diagsflag: diags
    out: [intermediatefile]
  raxml:
    run: muscleraxml-raxml.cwl
    in:
      raxmlinfile: muscle/intermediatefile
      raxml_model: model
    out: [raxmloutput]

muscleraxml-muscle.cwl:

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [muscle]
arguments: ["-out", "intermediatefile"]
inputs:
  muscleinfile:
    type: File
      inputBinding:
        position: 1
        prefix: -in
  diagsflag:
    type: boolean?
      inputBinding:
        position: 2
        prefix: -diags
outputs:
  intermediatefile:
    type: File
    outputBinding:
      glob: intermediatefile
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by alaindomissy160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 791 users visited in the last hour