Question: Change execution directory before running a tool (CWL)
1
gravatar for andreas.bleuler
11 months ago by
andreas.bleuler20 wrote:

Hi everybody,

I have a problem with hard-coded paths in scripts that would require changing the directory before executing a command line tool.

Let's consider the following minimal example:

○ → tree
.
├── code
│   └── script.py
├── data
│   └── hi.txt
└── tool.cwl

with script.py:

#! /usr/bin/env python3

with open('../data/hi.txt', 'r') as f:
    for line in f:
        print(line)

When I run script.py from the /code diretory, it prints the content of hi.txt. I then try to wrap this into a CWL tool:

cwlVersion: v1.0
class: CommandLineTool

inputs:

  script:
    type: File
    inputBinding:
      position: 1
    default:
      class: File
      path: code/script.py

  data:
    type: File
    default:
      class: File
      path: data/hi.txt

outputs: []

requirements:
  InitialWorkDirRequirement:
    listing:
      - entry: $(inputs.script)
        entryname: code/script.py
      - entry: $(inputs.data)
        entryname: data/hi.txt

Running this tool fails when the script tries to open hi.txt. I can fix this by changing into the right directory at the beginning of the script:

import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))

Now I'm wondering if CWL offers a way to change the directory before executing a tool instead. I am aware that using arguments instead of hard-coded paths or at least making the paths relative to the root of my dummy project would help here. But let's just assume that I can not modify the script at all.

Thanks!

cwl • 327 views
ADD COMMENTlink modified 11 months ago by Tom530 • written 11 months ago by andreas.bleuler20
1
gravatar for Tom
11 months ago by
Tom530
Tom530 wrote:

Hi Andreas,

i am not sure about the InitialWorkDirRequirement you are using. According to the CWL specs entryname is only supposed to overwrite the basename property of the entry, so i'm wondering if the "code" and "script" subdirectories are actually being created. If you know for a fact that it works let me know, i'm going to adjust the way i do it in that case. If you aren't sure, try:

InitialWorkDirRequirement:
  listing:
    - entry: "$({ class: 'Directory', listing: [(inputs.data]] })"
      entryname: "data"
    - entry: "$({ class: 'Directory', listing: [(inputs.script]] })"
      entryname: "code"

arguments:
  - valueFrom: $("script/"+(inputs.script.basename))
    position: 1

Regards, Tom

ADD COMMENTlink modified 11 months ago • written 11 months ago by Tom530
1

Hi Tom,

thanks for your remark. Yes, the two directories are created (at least when using the reference implementation, I haven't tried with any other CWL runner). The content of the execution directory for my above example looks like this:

├── code
│   └── script.py -> /Users/ableuler/cwl-playground/code/script.py
└── data
    └── hi.txt -> /Users/ableuler/cwl-playground/data/hi.txt

Cheers, Andreas

ADD REPLYlink written 11 months ago by andreas.bleuler20

Alright, good to know! Have you tried passing the script's name as an argument (like in the example above) instead of using inputBinding? Because if the directories are created as planned and no job input for script is provided, i would expect cwltool to:

  • use default values for script and data, which means passing script.py and hi.txt from the subdirectory to the tool

  • create both directories as specified

  • place script.py and hi.txt in the respective subdirectories

  • invoke the command line tool and pass script.py as the only argument

The last step is where i assume stuff goes wrong. The default value for script is a file, not the path to a file. So the command line argument is "script.py" and not "code/script.py". The script is therefore not executed in the "code" subdirectory.

In my own experience, combining relative paths with CWL tends to cause a horrible dumpster fire. Necat was the last tool which forced that stuff upon me, and what should have been a single tool wrapper became two bash scripts and a four step workflow.

ADD REPLYlink modified 10 months ago • written 10 months ago by Tom530

Hi Tom, thanks for that suggestion, but adding the script as an argument instead of an input doesn't help.

In fact, in my above attempt the path of the script appears in the command invocation as it should (under path/to/tmp/exec-dir/code/script.py). But that doesn't change the fact that the script is executed in path/to/tmp/exec-dir and there seems to be no way to force CWL to execute the script in path/to/tmp/exec-dir/code except for changing the path inside the script.

I completely agree that hard-coded relative paths are a bad idea in CWL. I was just trying to find out to what extent any existing (potentially poorly written) codebase could be used in CWL pipelines without any modification of the code itself.

ADD REPLYlink written 10 months ago by andreas.bleuler20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2081 users visited in the last hour