Question

Directory or string type

1

Entering edit mode

7.3 years ago

DT ▴ 20

Hello,

If I have a command line tool like this:

java -Xmx55g -Xms55g -jar /a/apps/picard/picard-tools-2.4.1/picard.jar ExtractIlluminaBarcodes \
INPUT_BASECALLS_DIR=/150313_D00282_0057_BC6FF5ANXX/Data/Intensities/BaseCalls \
INPUT_BARCODE_FILE=/barcodes/barcode1.txt \
OUTPUT_METRICS_FILE=/barcodes/barcode1.metrics

In CWL, should I code the above INPUTs and OUTPUT as type 'Directory' and 'File'? Or should I code those as type 'string'? What would be the difference in behavior of CWL during execution? What are the pros and cons?

I seem to be having better luck with getting it to run in CWL as type 'string' below. And even though 'OUTPUT_METRICS_FILE=' designates the name of the File to be output, I put it in the 'inputs' section as just another parameter 'string', and that seems to work okay. Please help us to think clearer about this. Thanks!!

cwlVersion: v1.0
class: CommandLineTool
baseCommand: java

inputs:
  - id: basecalls_dir
    type: string
    inputBinding:
      position: 5
      separate: false
      prefix: "INPUT_BASECALLS_DIR="
  - id: barcode_file
    #type: File
    type: string
    inputBinding:
      position: 8
      separate: false
      prefix: "INPUT_BARCODE_FILE="
  - id: metrics_file
    type: string
    inputBinding:
      position: 10
      separate: false
      prefix: "OUTPUT_METRICS_FILE="

cwl • 1.8k views

ADD COMMENT • link updated 7.3 years ago by StarvingMarvin ▴ 20 • written 7.3 years ago by DT ▴ 20

0

Entering edit mode

There are tool descriptions for some of the picard tools in the CWL repository:

https://github.com/common-workflow-language/workflows/tree/master/tools

Nothing with directory inputs though.

I agree with StarvingMarvin's answer in general; Files and Directories for flexibility. Still, my experience with CWL directories is mixed. As I understand it when a Directory is initialized, the first thing that happens is that the implementation parses it and all subfolders to list all fiiles. The basecall dir can be rather large, depending on your instrument. I'm not sure when CWL decides it necessary to copy a Directory or File, but you might not have that happen to the basecall directory with the same motivation as before.

ADD REPLY • link 7.3 years ago by karl.nordstrom ▴ 90

0

Entering edit mode

Thanks StarvingMarivn and karl.nordstrom for your replies.

Yes, when I run the CommandLine Tool above with the inputs as Directory and File, cwl-runner looks like it's running via cpu and memory usage, but no output nor error message was produced after a long while. But when I run the Tool with inputs as 'string', then everything works as if I was running on shell command line.

The basecalls_dir is from Illumina whole genome sequencing, so it's very big, 250 GB, with lots of folder levels and files.

ADD REPLY • link 7.3 years ago by DT ▴ 20

score 1 · Answer 1 · 2017-01-14

The only way string declaration might work is if you run it on your local machine outside of docker. When you annotate Files and Directories properly, execution environment can a) copy files around / make them somehow available on other machines b) map those files properly when executed inside of container.

Also when something is declared as File, it can carry along other pieces of information: metadata, file type, secondary files / indices etc...

As for the output, if you are passing an input that should serve as an output name, than it's a string, but it should probably be a base name only. Fully expanded path if really necessary can be calculated like this

- id: metrics_file
      type: string
      inputBinding:
        position: 10
        separate: false
        prefix: "OUTPUT_METRICS_FILE="
        valueFrom=$(runtime.outdir)/$(self)