CWL: How do I use pipes?
2
2
Entering edit mode
7.3 years ago
mkher ▴ 50

I want to stream the output of one command as the input to the second command. How do I do that using CWL?

For example:

zcat sample.fastq.gz | grep ...

My attempt to use stdout captured that stdout to a tmp file.

Thank you Manisha

CWL • 5.3k views
ADD COMMENT
5
Entering edit mode
7.3 years ago

If you want to put it all together in a single CommandLineTool, you need to include ShellCommandRequirement. See:

http://www.commonwl.org/v1.0/CommandLineTool.html#ShellCommandRequirement

and for pipes and other interpreted characters you have to use shellQuote: False

An alternative is to do a workflow. You can specify your input and outputs as streamable in the tool descriptions. In principle, that should achieve the wanted behavior, but this depends on the implementation and I'm not sure how far along this is.

ADD COMMENT
0
Entering edit mode

Can you provide an example that actually uses the ShellCommandRequirement and shellQuote:false?

I know that the "right" way is using the "streamable: true" but what do you do if you CWL runner tool doesn't support that and you don't want to save the intermediate outputs?

I am new to CWL, so I might have the formatting/syntax wrong, but would something like this work?

cwlVersion: v1.0
class: CommandLineTool
requirements:
  - class: ShellCommandRequirement
baseCommand: zcat 
inputs:
  files_input:
    type: File
    streamable: true
    shellQuote: False 
    inputBinding:
      position: 1
  pipe:
    type:string
    default: "|"
    shellQuote: False 
    inputbinding:
      positioning: 2
  regex:
    type: string
    shellQuote: False
    prefix: "grep"
    inputBinding:
      position: 3
stdout: zcatgrep_output.txt
outputs:
  grep_file:
    type: File
    streamable: true
    outputBinding:
      glob: zcatgrep_output.txt
ADD REPLY
2
Entering edit mode
7.2 years ago
karl.sebby ▴ 100

Seems like the proper way is to do a workflow. That way you can keep the zcat and grep commands separate. From the intro to CWL doc "CWL tasks are isolated and you must be explicit about your inputs and outputs." First create tool wrappers for zcat and grep:


cwlVersion: v1.0
class: CommandLineTool
baseCommand: zcat
stdout: $(inputs.unzippedFileName)  
inputs:
  gzipFile:
    type: File
    inputBinding:
      position: 1
  unzippedFileName:
    type: string
outputs:
  unzippedFile:
    type: stdout

cwlVersion: v1.0
class: CommandLineTool
baseCommand: grep
stdout: $(inputs.outFileName)
inputs:
  pattern:
    type: string
    inputBinding:
      position: 1
  fileToSearch:
    type: File
    inputBinding:
      position: 2
  outFileName:
    type: string
outputs:
  grepOut:
    type: stdout

Then make a workflow to put them together:

cwlVersion: v1.0
class: Workflow

############

inputs:
  GZIPFILE:
    type: File
  UNZIPPEDFILENAME:
    type: string
    default: blah       #doesn't really matter, not permanant output.
  PATTERN:
    type: string
  OUTFILENAME:
    type: string

############

outputs:
  grepOutput:
    type: File
    outputSource: grep/grepOut

############

steps:

  zcat:
    run: zcat.cwl
    in:
      gzipFile: GZIPFILE
      unzippedFileName: UNZIPPEDFILENAME
    out: [unzippedFile]


  grep:
    run: grep.cwl
    in:
      pattern: PATTERN
      fileToSearch: zcat/unzippedFile
      outFileName: OUTFILENAME
    out: [grepOut]

And finally a YML file to describe your inputs:

GZIPFILE:
  class: File
  path: test.txt.gz
#UZIPPEDFILENAME: Not needed, default given in workflow.
PATTERN: two
OUTFILENAME: zcatPipeGrepWorkflowOutput.txt

A bit of work this way, but once you have it, you can reuse. My test.txt.gz file just contains four lines [one, two, three, four] and the file returned just contains the search pattern 'two'. An easier way would be to just make a bash script and make a tool wrapper for it, but that doesn't keep your tools isolated.

ADD COMMENT
0
Entering edit mode

you could make the inputs streamable as well. I'm not sure if stdout defaults to streamable.

  fileToSearch:
    type: File
    streamable: true
    inputBinding:
      position: 2
ADD REPLY
0
Entering edit mode

Thank you. I do want to do it as a workflow with separate steps for the two tools. Maybe not for this example of zcat and grep, but in more complex cases. The streamable keyword is what I need.

ADD REPLY

Login before adding your answer.

Traffic: 1745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6