Question: CWL LoadListing - cwltool can't handle large number of input files
0
gravatar for Tom
5 months ago by
Tom340
Bielefeld University, CeBiTec, Germany
Tom340 wrote:

Hello everyone,

I'm working on a workflow which will have to deal with a directory containing more than a million files as input for the first step. cwltool (version 1.0.20181217162649, running in a python 3.6.7 venv) works fine if i input a very small amount of test data. As soon as i input a larger set (~60.000 files) it will complain about the large number of files and suggest to add the following to my CommandLineTool:

$namespaces:
  cwltool: "http://commonwl.org/cwltool#"
 hints:
   cwltool:LoadListingRequirement:
     loadListing: shallow_listing

However, these sections are already present in my tool description (although the page referenced under namespaces doesn't seem to exist). But cwltool does not seem to know how to interpret them. It will put out the following message:

demultiplexing/demultiplexingToolDeepbinner.cwl:19:3: Unknown hint http://commonwl.org/cwltool#LoadListingRequirement

In most cases, the workflow execution will fail shortly after.

I have found no documentation regarding load listing anywhere. A (test) workflow in the cwltool repository produces the same error.

This is the code of my command line tool:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [deepbinner, realtime]

doc: |
  Uses deepbinner to sort raw nanopore reads of barcoded DNA by barcode.

requirements:
  InlineJavascriptRequirement: {}
  DockerRequirement:
    dockerImageId: tmi_deepbinner
  InitialWorkDirRequirement:
    listing:
      - entry: $(inputs.reads_directory)
        writable: true

arguments:
  - valueFrom: $("demultiplexed")
    prefix: --out_dir
    position: 2

hints:
  cwltool:LoadListingRequirement:
    loadListing: no_listing

inputs:
  reads_directory:
    label: Directory containing raw nanopore reads in .fast5 format
    type: Directory
    inputBinding:
      prefix: --in_dir
      position: 1
  barcoding_type:
    label: Specifies wether native or rapid barcoding was performed
    type: string
    inputBinding:
      prefix: --
      separate: false
      position: 3

outputs:
  barcode_directories:
    label: Directories with raw .fast5 data, each pertaining to a specific barcode
    type: Directory[]
    outputBinding:
      glob: $("demultiplexed/barcode*")
  unclassified_reads_directory:
    label: Directory containing raw .fast5 data that could not be matched to a barcode
    type: ["null", Directory]
    outputBinding:
      glob: demultiplexed/unclassified
cwl • 286 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by Tom340
2
gravatar for Tom
5 months ago by
Tom340
Bielefeld University, CeBiTec, Germany
Tom340 wrote:

The solution is running cwltool with the --enable-ext flag.

ADD COMMENTlink written 5 months ago by Tom340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1631 users visited in the last hour