Question: CWL LoadListing - cwltool can't handle large number of input files
0
gravatar for Tom
7 weeks ago by
Tom200
Bielefeld University, CeBiTec, Germany
Tom200 wrote:

Hello everyone,

I'm working on a workflow which will have to deal with a directory containing more than a million files as input for the first step. cwltool (version 1.0.20181217162649, running in a python 3.6.7 venv) works fine if i input a very small amount of test data. As soon as i input a larger set (~60.000 files) it will complain about the large number of files and suggest to add the following to my CommandLineTool:

$namespaces:
  cwltool: "http://commonwl.org/cwltool#"
 hints:
   cwltool:LoadListingRequirement:
     loadListing: shallow_listing

However, these sections are already present in my tool description (although the page referenced under namespaces doesn't seem to exist). But cwltool does not seem to know how to interpret them. It will put out the following message:

demultiplexing/demultiplexingToolDeepbinner.cwl:19:3: Unknown hint http://commonwl.org/cwltool#LoadListingRequirement

In most cases, the workflow execution will fail shortly after.

I have found no documentation regarding load listing anywhere. A (test) workflow in the cwltool repository produces the same error.

This is the code of my command line tool:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [deepbinner, realtime]

doc: |
  Uses deepbinner to sort raw nanopore reads of barcoded DNA by barcode.

requirements:
  InlineJavascriptRequirement: {}
  DockerRequirement:
    dockerImageId: tmi_deepbinner
  InitialWorkDirRequirement:
    listing:
      - entry: $(inputs.reads_directory)
        writable: true

arguments:
  - valueFrom: $("demultiplexed")
    prefix: --out_dir
    position: 2

hints:
  cwltool:LoadListingRequirement:
    loadListing: no_listing

inputs:
  reads_directory:
    label: Directory containing raw nanopore reads in .fast5 format
    type: Directory
    inputBinding:
      prefix: --in_dir
      position: 1
  barcoding_type:
    label: Specifies wether native or rapid barcoding was performed
    type: string
    inputBinding:
      prefix: --
      separate: false
      position: 3

outputs:
  barcode_directories:
    label: Directories with raw .fast5 data, each pertaining to a specific barcode
    type: Directory[]
    outputBinding:
      glob: $("demultiplexed/barcode*")
  unclassified_reads_directory:
    label: Directory containing raw .fast5 data that could not be matched to a barcode
    type: ["null", Directory]
    outputBinding:
      glob: demultiplexed/unclassified
cwl • 133 views
ADD COMMENTlink modified 4 weeks ago • written 7 weeks ago by Tom200
2
gravatar for Tom
4 weeks ago by
Tom200
Bielefeld University, CeBiTec, Germany
Tom200 wrote:

The solution is running cwltool with the --enable-ext flag.

ADD COMMENTlink written 4 weeks ago by Tom200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1915 users visited in the last hour