CWL LoadListing - cwltool can't handle large number of input files
1
0
Entering edit mode
5.2 years ago
Tom ▴ 540

Hello everyone,

I'm working on a workflow which will have to deal with a directory containing more than a million files as input for the first step. cwltool (version 1.0.20181217162649, running in a python 3.6.7 venv) works fine if i input a very small amount of test data. As soon as i input a larger set (~60.000 files) it will complain about the large number of files and suggest to add the following to my CommandLineTool:

$namespaces:
  cwltool: "http://commonwl.org/cwltool#"
 hints:
   cwltool:LoadListingRequirement:
     loadListing: shallow_listing

However, these sections are already present in my tool description (although the page referenced under namespaces doesn't seem to exist). But cwltool does not seem to know how to interpret them. It will put out the following message:

demultiplexing/demultiplexingToolDeepbinner.cwl:19:3: Unknown hint http://commonwl.org/cwltool#LoadListingRequirement

In most cases, the workflow execution will fail shortly after.

I have found no documentation regarding load listing anywhere. A (test) workflow in the cwltool repository produces the same error.

This is the code of my command line tool:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [deepbinner, realtime]

doc: |
  Uses deepbinner to sort raw nanopore reads of barcoded DNA by barcode.

requirements:
  InlineJavascriptRequirement: {}
  DockerRequirement:
    dockerImageId: tmi_deepbinner
  InitialWorkDirRequirement:
    listing:
      - entry: $(inputs.reads_directory)
        writable: true

arguments:
  - valueFrom: $("demultiplexed")
    prefix: --out_dir
    position: 2

hints:
  cwltool:LoadListingRequirement:
    loadListing: no_listing

inputs:
  reads_directory:
    label: Directory containing raw nanopore reads in .fast5 format
    type: Directory
    inputBinding:
      prefix: --in_dir
      position: 1
  barcoding_type:
    label: Specifies wether native or rapid barcoding was performed
    type: string
    inputBinding:
      prefix: --
      separate: false
      position: 3

outputs:
  barcode_directories:
    label: Directories with raw .fast5 data, each pertaining to a specific barcode
    type: Directory[]
    outputBinding:
      glob: $("demultiplexed/barcode*")
  unclassified_reads_directory:
    label: Directory containing raw .fast5 data that could not be matched to a barcode
    type: ["null", Directory]
    outputBinding:
      glob: demultiplexed/unclassified
cwl • 1.5k views
ADD COMMENT
2
Entering edit mode
5.2 years ago
Tom ▴ 540

The solution is running cwltool with the --enable-ext flag.

ADD COMMENT

Login before adding your answer.

Traffic: 1508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6