CWL: default values for custom types in a workflow
4
1
Entering edit mode
6.8 years ago
thomas.e ▴ 110

Hello,

How do I set the default value for a custom type in a workflow. Something like:

steps:
   mystep:
     run: mything.cwl
     in:
       important_arg:
          default: magicValue

where magic_value is not a primitive type. In the example below it is an enum and I also need to do the same with a record.

Here is a concrete example - that doesn't work. I'm trying to specify a default value for end_mode which is unlikely to change for this workflow. end_mode is an enum defined in trimmomatic-type.cwl that is reference by the commandlinetool cwl


#!/usr/bin/env cwl-runner

class: Workflow
cwlVersion: v1.0

inputs:
  read1: File
  read2: File


outputs:
  trim-logs:
    type: File
    outputSource: trimmomatic/output_log
  read1-paired:
    type: File
    outputSource: trimmomatic/reads1_trimmed
  read2-paired:
    type: File
    outputSource: trimmomatic/reads1_trimmed_unpaired
  read1-unpaired:
    type: File
    outputSource: trimmomatic/reads2_trimmed_paired
  read2-unpaired:
    type: File
    outputSource: trimmomatic/reads2_trimmed_unpaired

steps:
  trimmomatic:
    run: ../src/tools/trimmomatic.cwl
    in:
      reads1: read1
      reads2: read2
      end_mode:
        default: PE
    out: [output_log, reads1_trimmed, reads1_trimmed_unpaired, reads2_trimmed_paired, reads2_trimmed_unpaired]

This fails with:


$ cwl-runner ../src/pdx-pl.cwl pdx.yaml 
/home/thomas.e/.local/bin/cwl-runner 1.0.20170308174714
Resolved '../src/pdx-pl.cwl' to 'file:///stornext/Home/data/allstaff/t/thomas.e/dev/pdx-genome/src/pdx-pl.cwl'
Tool definition failed validation:
Got error Type property "['null', 'end_mode']" not a valid Avro schema: Union item must be a valid Avro schema: Could not make an Avro Schema object from end_mode. while processing inputs of file:///stornext/Home/data/allstaff/t/thomas.e/dev/pdx-genome/src/pdx-pl.cwl#trimmomatic:
If I change the type of end_mode to string, it works.

For reference here is trimmomatic.cml

!/usr/bin/env cwl-runner

cwlVersion: v1.0 class: CommandLineTool

hints: EnvVarRequirement: envDef: CLASSPATH: /stornext/System/data/apps/trimmomatic/trimmomatic-0.36/trimmomatic-0.36.jar SoftwareRequirement: packages: trimmomatic: specs: [ https://identifiers.org/rrid/RRID:SCR_011848 ] version: [ "0.32", "0.35", "0.36" ]

requirements:

- $import: trimmomatic-docker.yml

  • $import: trimmomatic-types.yml
  • class: InlineJavascriptRequirement
  • class: ShellCommandRequirement

inputs: phred: type: trimmomatic-types.yml#phred default: '64' inputBinding: prefix: -phred separate: false position: 4 doc: | "33" or "64" specifies the base quality encoding. Default: 64

tophred64: type: boolean? inputBinding: position: 12 prefix: TOPHRED64 separate: false doc: This (re)encodes the quality part of the FASTQ file to base 64.

headcrop: type: int? inputBinding: position: 13 prefix: 'HEADCROP:' separate: false doc: | Removes the specified number of bases, regardless of quality, from the beginning of the read. The numbser specified is the number of bases to keep, from the start of the read.

tophred33: type: boolean? inputBinding: position: 12 prefix: TOPHRED33 separate: false doc: This (re)encodes the quality part of the FASTQ file to base 33.

nthreads: type: int default: 1 inputBinding: position: 4 prefix: -threads doc: Number of threads

minlen: type: int? inputBinding: position: 100 prefix: 'MINLEN:' separate: false doc: | This module removes reads that fall below the specified minimal length. If required, it should normally be after all other processing steps. Reads removed by this step will be counted and included in the "dropped reads" count presented in the trimmomatic summary.

java_opts: type: string? inputBinding: position: 1 shellQuote: false doc: | JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m")

leading: type: int? inputBinding: position: 14 prefix: 'LEADING:' separate: false doc: | Remove low quality bases from the beginning. As long as a base has a value below this threshold the base is removed and the next base will be investigated.

slidingwindow: type: trimmomatic-types.yml#slidingWindow? inputBinding: position: 15 valueFrom: | 'SLIDINGWINDOW:'$(self.windowSize)':'$(self.requiredQuality) doc: | Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. By considering multiple bases, a single poor quality base will not cause the removal of high quality data later in the read. <windowsize> specifies the number of bases to average across <requiredquality> specifies the average quality required

illuminaClip: type: trimmomatic-types.yml#illuminaClipping? inputBinding: valueFrom: | ILLUMINACLIP:$(inputs.illuminaClip.adapters.path):$(self.seedMismatches):$(self.palindromeClipThreshold):$(self.simpleClipThreshold):$(self.minAdapterLength):$(self.keepBothReads) position: 11 doc: Cut adapter and other illumina-specific sequences from the read.

crop: type: int? inputBinding: position: 13 prefix: 'CROP:' separate: false doc: | Removes bases regardless of quality from the end of the read, so that the read has maximally the specified length after this step has been performed. Steps performed after CROP might of course further shorten the read. The value is the number of bases to keep, from the start of the read.

reads2: type: File? format: edam:format_1930 # fastq inputBinding: position: 6 doc: FASTQ file of R2 reads in Paired End mode

reads1: type: File format: edam:format_1930 # fastq inputBinding: position: 5 doc: FASTQ file of reads (R1 reads in Paired End mode)

avgqual: type: int? inputBinding: position: 101 prefix: 'AVGQUAL:' separate: false doc: | Drop the read if the average quality is below the specified level

trailing: type: int? inputBinding: position: 14 prefix: 'TRAILING:' separate: false doc: | Remove low quality bases from the end. As long as a base has a value below this threshold the base is removed and the next base (which as trimmomatic is starting from the 3' prime end would be base preceding the just removed base) will be investigated. This approach can be used removing the special Illumina "low quality segment" regions (which are marked with quality score of 2), but we recommend Sliding Window or MaxInfo instead

maxinfo: type: trimmomatic-types.yml#maxinfo? inputBinding: position: 15 valueFrom: | MAXINFO:$(self.targetLength):$(strictness) doc: | Performs an adaptive quality trim, balancing the benefits of retaining longer reads against the costs of retaining bases with errors. <targetlength>: This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. <strictness>: This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness.

end_mode: type: trimmomatic-types.yml#end_mode inputBinding: position: 3 doc: | Single End (SE) or Paired End (PE) mode

outputs: reads1_trimmed: type: File format: edam:format_1930 # fastq outputBinding: glob: $(inputs.reads1.nameroot).trimmed.fastq

output_log: type: File outputBinding: glob: $(inputs.reads1.nameroot).log label: Trimmomatic log doc: | log of all read trimmings, indicating the following details: the read name the surviving sequence length the location of the first surviving base, aka. the amount trimmed from the start the location of the last surviving base in the original read the amount trimmed from the end

reads1_trimmed_unpaired: type: File? format: edam:format_1930 # fastq outputBinding: glob: $(inputs.reads1.nameroot).unpaired.trimmed.fastq

reads2_trimmed_paired: type: File? format: edam:format_1930 # fastq outputBinding: glob: $(inputs.reads2.nameroot).trimmed.fastq

reads2_trimmed_unpaired: type: File? format: edam:format_1930 # fastq outputBinding: glob: $(inputs.reads2.nameroot).unpaired.trimmed.fastq

baseCommand: [ java, org.usadellab.trimmomatic.Trimmomatic ]

arguments: - valueFrom: $(inputs.reads1.nameroot).log prefix: -trimlog position: 4 - valueFrom: $(inputs.reads1.nameroot).trimmed.fastq position: 7 - valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.reads1) { return inputs.reads1.nameroot + '.trimmed.unpaired.fastq'; } else { return null; } } position: 8 - valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.reads2) { return inputs.reads2.nameroot + '.trimmed.fastq'; } else { return null; } } position: 9 - valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.reads2) { return inputs.reads2.nameroot + '.trimmed.unpaired.fastq'; } else { return null; } } position: 10

doc: | Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application. There are two major modes of the program: Paired end mode and Single end mode. The paired end mode will maintain correspondence of read pairs and also use the additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process. Trimmomatic works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used).

$namespaces: { edam: http://edamontology.org/ } $schemas: [ http://edamontology.org/EDAM_1.16.owl ]

trimmomatic-type.cwl


class: SchemaDefRequirement
types:
  - type: enum
    name: phred
    symbols: [ '64', '33' ]
  - type: record
    name: slidingWindow
    fields:
     - name: windowSize
       type: int
     - name: requiredQuality
       type: int
  - type: enum
    name: trueFalse
    symbols: [ 'true', 'false' ]
  - type: record
    name: illuminaClipping
    fields:
      - name: adapters
        type: File
        doc: |
          FASTA file containing adapters, PCR sequences, etc. It is used to search
          for and remove these sequences in the input FASTQ file(s)
      - name: seedMismatches
        type: int
        doc: |
          specifies the maximum mismatch count which will still allow a full match
          to be performed
      - name: palindromeClipThreshold
        type: int
        doc: |
          specifies how accurate the match between the two 'adapter ligated' reads
          must be for PE palindrome read alignment.
      - name: simpleClipThreshold
        type: int
        doc: |
          specifies how accurate the match between any adapter etc. sequence must
          be against a read
      - name: minAdapterLength
        type: int?
        doc: |
          In addition to the alignment score, palindrome mode can verify that a
          minimum length of adapter has been detected. If unspecified, this
          defaults to 8 bases, for historical reasons. However, since palindrome
          mode has a very low false positive rate, this can be safely reduced, even
          down to 1, to allow shorter adapter fragments to be removed.
      - name: keepBothReads
        type: trueFalse
        doc: |
          After read-though has been detected by palindrome mode, and the adapter
          sequence removed, the reverse read contains the same sequence information
          as the forward read, albeit in reverse complement. For this reason, the
          default behaviour is to entirely drop the reverse read. By specifying
          "true" for this parameter, the reverse read will also be retained, which
          may be useful e.g. if the downstream tools cannot handle a combination of
          paired and unpaired reads.
  - type: record
    name: maxinfo
    fields:
      - name: targetLength
        type: int
      - name: strictness
        type: int
  - type: enum
    name: end_mode
    symbols: [ SE, PE ]

$namespaces: { edam: http://edamontology.org/ } $schemas: [ http://edamontology.org/EDAM_1.16.owl ]

CWL • 4.0k views
ADD COMMENT
1
Entering edit mode
6.8 years ago
thomas.e ▴ 110

The answer is that I need to specify the type in the workflow input. i.e.


inputs:
  mode:
    type: type.yml#end_mode
    default: PE
ADD COMMENT
1
Entering edit mode
6.8 years ago

Your example seems to work this way: just add the same requirement to workflow, and use your custom type

requirements:
  - $import: type.yml

inputs:
  mode:
    type: type.yml#end_mode
    default: PE

However I think this way is simpler:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0

class: CommandLineTool

baseCommand: echo

inputs:
  mode:
    type:
      type: enum
      symbols: ["PE", "SE"]
    inputBinding:
      position: 1

outputs: []
ADD COMMENT
0
Entering edit mode
6.8 years ago
thomas.e ▴ 110

I've made a minimal example that displays the problem.

Here is the workflow:


#!/usr/bin/env cwl-runner

class: Workflow
cwlVersion: v1.0

inputs:
  mode:
    type: string
    default: PE

outputs:
  []

steps:
  echoecho:
    run: clt.cwl
    in:
      mode: mode
    out: []

Here is the command line tool:


#!/usr/bin/env cwl-runner

cwlVersion: v1.0

class: CommandLineTool

requirements:
- $import: type.yml

baseCommand: echo

inputs:
  mode:
    #
    # If you change this type to string it works
    type: type.yml#end_mode
    inputBinding:
      position: 1

outputs: []

Here is the type definition:


class: SchemaDefRequirement
types:
  - type: enum
    name: end_mode
    symbols: [ 'SE', 'PE' ]

When I run (using the reference implementation) I get:


$ ./wf.yml 
/home/thomas.e/.local/bin/cwl-runner 1.0.20170308174714
Resolved './wf.yml' to 'file:///stornext/Home/data/allstaff/t/thomas.e/dev/pdx-genome/test/wf.yml'
Tool definition failed validation:
Got error `Type property "end_mode" not a valid Avro schema: Could not make an Avro Schema object from end_mode.` while processing inputs of file:///stornext/Home/data/allstaff/t/thomas.e/dev/pdx-genome/test/wf.yml#echoecho:
{
    "fields": [
        {
            "source": "file:///stornext/Home/data/allstaff/t/thomas.e/dev/pdx-genome/test/wf.yml#mode", 
            "type": "end_mode", 
            "inputBinding": {
                "position": 1
            }, 
            "name": "mode"
        }
    ], 
    "type": "record", 
    "name": "input_record_schema"
}

If I change the type of mode in the command line tool (clt.cwl) it works:


$ ./wf.yml 
/home/thomas.e/.local/bin/cwl-runner 1.0.20170308174714
Resolved './wf.yml' to 'file:///stornext/Home/data/allstaff/t/thomas.e/dev/pdx-genome/test/wf.yml'
invalid field `job_order`, expected one of: 'mode'
[job echoecho] /stornext/HPCScratch/home/thomas.e/tmp/tmpXxh45H$ echo \
    PE
PE
[job echoecho] completed success
[step echoecho] completed success
[workflow wf.yml] outdir is /home/thomas.e/tmp/tmpuasi8P
{}
Final process status is success
ADD COMMENT

Login before adding your answer.

Traffic: 2316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6