Question: Create writable directory inside container before executing baseCommand. How to properly use InitialWorkDirRequirement
1
gravatar for misha.kotliar
2.9 years ago by
misha.kotliar10 wrote:

Hello all

I'm using CWL v1.0 and trying to understand how I can create writeable directory inside output directory of container and then return it to a specific place on my computer. The question came after trying to run STAR in container from inside cwl file.

The general structure of the command, that I want to implement

STAR --runMode genomeGenerate --genomeDir /some/path/to/output/folder [other parameters]

So when I run my cwl and set genomeDir to ./ it works fine, because cwltool mounts output directory as writable and STAR can put results there. Then I use ./ as glob in output parameter and return all the data I need from container to my computer.

Part of cwl file to describe how I set inputs and outputs:

<skipped lines>

inputs:
  genomeDir:
    type: string
    inputBinding:
      position: 1
      prefix: --genomeDir

<skipped lines>

outputs:
  indices:
    type: Directory
    outputBinding:
      glob: $(inputs.genomeDir)

But when I try to set genomeDir to any other folder, for example ./dm3, STAR gives me an error, that I need first to create the dm3 folder. To play a little bit with this issue I created simple cwl file to understand how to solve my problem

My cwl file (I followed an example http://www.commonwl.org/v1.0/UserGuide.html#Creating_files_at_runtime)

cwlVersion: v1.0
class: CommandLineTool
hints:
  DockerRequirement:
    dockerPull: ubuntu
baseCommand: ls
arguments: ["-p"]
stdout: output.txt
requirements:
  InitialWorkDirRequirement:
    listing:
      - entryname: $(inputs.fileName)
        entry: Some text inside the file
      - class: Directory
        basename: folderName
        listing: []

inputs:
  dirName:
    type: string
  fileName:
    type: string

outputs:
  output:
    type: stdout
  fileOut:
    type: File
    outputBinding:
      glob: $(inputs.fileName)
  dirOut:
    type: Directory
    outputBinding:
      glob: folderName

Job file (in this case I don't actually use dirName, because I set it in cwl as string)

dirName: new_folder
fileName: textfile.txt

When I run it I receive the following error

cwl-runner --debug createfile.cwl job.yml
/usr/local/bin/cwl-runner 1.0.20160930152149
[job createfile.cwl] initializing from file:///Users/kot4or/workspaces/cwl_ws/sandbox/create_directory/createfile.cwl
[job createfile.cwl] {
    "dirName": "new_folder", 
    "fileName": "textfile.txt"
}
[job createfile.cwl] path mappings is {}
[job createfile.cwl] command line bindings is [
    {
        "position": [
            -1000000, 
            0
        ], 
        "datum": "ls"
    }, 
    {
        "position": [
            0, 
            0
        ], 
        "datum": "-p"
    }
]
[job createfile.cwl] /var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpFDSLKV$ docker \
    run \
    -i \
    --volume=/private/var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpFDSLKV:/var/spool/cwl:rw \
    --volume=/private/var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpbW3Lpe:/tmp:rw \
    --workdir=/var/spool/cwl \
    --read-only=true \
    --log-driver=none \
    --user=501 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    ubuntu \
    ls \
    -p > /var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpFDSLKV/output.txt
[job createfile.cwl] initial work dir {
    "_:cf81ebd5-810c-40b0-bb68-040f0322ca40": [
        "_:cf81ebd5-810c-40b0-bb68-040f0322ca40", 
        "/var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpFDSLKV", 
        "Directory"
    ], 
    "_:bb364b67-6b01-4b9d-996a-0bc227a24489": [
        "_:bb364b67-6b01-4b9d-996a-0bc227a24489", 
        "/var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpFDSLKV/folderName", 
        "Directory"
    ], 
    "_:7be322d4-0e8f-4edd-b852-8a113fdeb5fe": [
        "Some text inside the file", 
        "/var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpFDSLKV/textfile.txt", 
        "CreateFile"
    ]
}
Error collecting output for parameter 'dirOut'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/cwltool/draft2tool.py", line 383, in collect_output_ports
    ret[fragment] = self.collect_output(port, builder, outdir, fs_access, compute_checksum=compute_checksum)
  File "/usr/local/lib/python2.7/site-packages/cwltool/draft2tool.py", line 474, in collect_output
    raise WorkflowException("Did not find output file with glob pattern: '{}'".format(globpatterns))
WorkflowException: Did not find output file with glob pattern: '['folderName']'
Error while running job: Error collecting output for parameter 'dirOut': Did not find output file with glob pattern: '['folderName']'
[job createfile.cwl] completed permanentFail
[job createfile.cwl] {}
Final process status is permanentFail
[job createfile.cwl] Removing input staging directory /var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpLhLC5b
[job createfile.cwl] Removing temporary directory /var/folders/sd/41rg42_16q72_2yzl_vvgsbw0000gn/T/tmpbW3Lpe
Workflow error, try again with --debug for more information:
  Process status is ['permanentFail']
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/cwltool/main.py", line 677, in main
    **vars(args))
  File "/usr/local/lib/python2.7/site-packages/cwltool/main.py", line 233, in single_job_executor
    raise WorkflowException(u"Process status is %s" % (final_status))
WorkflowException: Process status is ['permanentFail']

It looks like I didn't create folderName directory at all.

If I comment lines which collect output dirOut I don't have errors, but in output.txt file to where I save result of "ls -p" command I can see only "textfile.txt" and "output.txt".

The questions are:

  1. Do I use a right way to create directory inside output directory of container?
  2. Is there any way to return that newly created directory from container to a specific directory on my computer?
  3. It looks like "basename" doesn't support expression type and can recognize only string. if I use basename: $(inputs.dirName) it doesn't set the right value from the input

I would appreciate if you give me any links to working examples of commandlinetools or workflows that use cwl v1.0 (not necessarily related to Directory type)

cwl common workflow language • 2.3k views
ADD COMMENTlink modified 2.8 years ago by Michael R. Crusoe1.7k • written 2.9 years ago by misha.kotliar10
0
gravatar for Michael R. Crusoe
2.8 years ago by
Common Workflow Language project
Michael R. Crusoe1.7k wrote:

Hello Misha,

My apologies for the delayed response.

You have several questions here, I will answer them in order (note, that it would be best to split them up in the future).

  1. According to my reading of the spec you are correct, the reference implementation is at fault. I've created the following issues to track this down: https://github.com/common-workflow-language/cwltool/issues/226 https://github.com/common-workflow-language/cwltool/issues/227
  2. Management of the outputs vary per implementation, for the reference implementation you can use --outdir.
  3. Correct, you can only use an expression where Expression is listed in the specification.

The user guide for v1.0 is at http://www.commonwl.org/v1.0/CommandLineTool.html#ShellCommandRequirement The conformance tests may also give you inspiration: https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.0/v1.0

ADD COMMENTlink written 2.8 years ago by Michael R. Crusoe1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2208 users visited in the last hour