Question: CWL: How do I use pipes?
2
gravatar for mkher
3.8 years ago by
mkher50
mkher50 wrote:

I want to stream the output of one command as the input to the second command. How do I do that using CWL?

For example:

zcat sample.fastq.gz | grep ...

My attempt to use stdout captured that stdout to a tmp file.

Thank you Manisha

cwl • 3.4k views
ADD COMMENTlink modified 3.7 years ago by karl.sebby100 • written 3.8 years ago by mkher50
5
gravatar for karl.nordstrom
3.8 years ago by
karl.nordstrom90 wrote:

If you want to put it all together in a single CommandLineTool, you need to include ShellCommandRequirement. See:

http://www.commonwl.org/v1.0/CommandLineTool.html#ShellCommandRequirement

and for pipes and other interpreted characters you have to use shellQuote: False

An alternative is to do a workflow. You can specify your input and outputs as streamable in the tool descriptions. In principle, that should achieve the wanted behavior, but this depends on the implementation and I'm not sure how far along this is.

ADD COMMENTlink written 3.8 years ago by karl.nordstrom90

Can you provide an example that actually uses the ShellCommandRequirement and shellQuote:false?

I know that the "right" way is using the "streamable: true" but what do you do if you CWL runner tool doesn't support that and you don't want to save the intermediate outputs?

I am new to CWL, so I might have the formatting/syntax wrong, but would something like this work?

cwlVersion: v1.0
class: CommandLineTool
requirements:
  - class: ShellCommandRequirement
baseCommand: zcat 
inputs:
  files_input:
    type: File
    streamable: true
    shellQuote: False 
    inputBinding:
      position: 1
  pipe:
    type:string
    default: "|"
    shellQuote: False 
    inputbinding:
      positioning: 2
  regex:
    type: string
    shellQuote: False
    prefix: "grep"
    inputBinding:
      position: 3
stdout: zcatgrep_output.txt
outputs:
  grep_file:
    type: File
    streamable: true
    outputBinding:
      glob: zcatgrep_output.txt
ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by alanh80
2
gravatar for karl.sebby
3.7 years ago by
karl.sebby100
karl.sebby100 wrote:

Seems like the proper way is to do a workflow. That way you can keep the zcat and grep commands separate. From the intro to CWL doc "CWL tasks are isolated and you must be explicit about your inputs and outputs." First create tool wrappers for zcat and grep:


cwlVersion: v1.0
class: CommandLineTool
baseCommand: zcat
stdout: $(inputs.unzippedFileName)  
inputs:
  gzipFile:
    type: File
    inputBinding:
      position: 1
  unzippedFileName:
    type: string
outputs:
  unzippedFile:
    type: stdout

cwlVersion: v1.0
class: CommandLineTool
baseCommand: grep
stdout: $(inputs.outFileName)
inputs:
  pattern:
    type: string
    inputBinding:
      position: 1
  fileToSearch:
    type: File
    inputBinding:
      position: 2
  outFileName:
    type: string
outputs:
  grepOut:
    type: stdout

Then make a workflow to put them together:

cwlVersion: v1.0
class: Workflow

############

inputs:
  GZIPFILE:
    type: File
  UNZIPPEDFILENAME:
    type: string
    default: blah       #doesn't really matter, not permanant output.
  PATTERN:
    type: string
  OUTFILENAME:
    type: string

############

outputs:
  grepOutput:
    type: File
    outputSource: grep/grepOut

############

steps:

  zcat:
    run: zcat.cwl
    in:
      gzipFile: GZIPFILE
      unzippedFileName: UNZIPPEDFILENAME
    out: [unzippedFile]


  grep:
    run: grep.cwl
    in:
      pattern: PATTERN
      fileToSearch: zcat/unzippedFile
      outFileName: OUTFILENAME
    out: [grepOut]

And finally a YML file to describe your inputs:

GZIPFILE:
  class: File
  path: test.txt.gz
#UZIPPEDFILENAME: Not needed, default given in workflow.
PATTERN: two
OUTFILENAME: zcatPipeGrepWorkflowOutput.txt

A bit of work this way, but once you have it, you can reuse. My test.txt.gz file just contains four lines [one, two, three, four] and the file returned just contains the search pattern 'two'. An easier way would be to just make a bash script and make a tool wrapper for it, but that doesn't keep your tools isolated.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by karl.sebby100

you could make the inputs streamable as well. I'm not sure if stdout defaults to streamable.

  fileToSearch:
    type: File
    streamable: true
    inputBinding:
      position: 2
ADD REPLYlink written 3.7 years ago by karl.nordstrom90

Thank you. I do want to do it as a workflow with separate steps for the two tools. Maybe not for this example of zcat and grep, but in more complex cases. The streamable keyword is what I need.

ADD REPLYlink written 3.7 years ago by mkher50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour