CWL: reading filenames from another file in the runtime and treating them as files
2
0
Entering edit mode
7.3 years ago
anton.khodak ▴ 10

I have the following set:

One input parameter is a tab-delimited file with names of other files which will be used later.

{
  "inputSamplesFile": {
    "class": "File",
    "path": "/path/to/inputSamples.txt"
  }
 ... other input parameters
}

inputSamples.txt

NA12878 /path/to/NA12878_wgs_20.bam /path/to/NA12878_wgs_20.bai
NA12877 /path/to/NA12877_wgs_20.bam /path/to/NA12877_wgs_20.bai
NA12882 /path/to/NA12882_wgs_20.bam /path/to/NA12882_wgs_20.bai

In my workflow I scatter over the data retrieved from inputSamplex.txt. For retrieving I wrote an Expression Tool.

Workflow:

class: Workflow

cwlVersion: v1.0

requirements: 
- class: ScatterFeatureRequirement
- class: InlineJavascriptRequirement
- class: StepInputExpressionRequirement

inputs:
  inputSamplesFile: 
    type: File
  gatk:
    type: File
  refIndex:
    type: File
  refFasta: 
    type: File
  refDict:
    type: File

steps:
  read_tsv:
    run: read_tsv.cwl
    in: 
      infile: inputSamplesFile
    out: [inputSamples]
  HaplotypeCallerERC:
    run: HaplotypeCallerERC.cwl
    scatter: [sampleName, bamFile, bamIndex]
    scatterMethod: dotproduct
    in:
      GATK: gatk
      RefFasta: refFasta
      RefIndex: refIndex
      RefDict: refDict
      sampleName:
        source: "#read_tsv/inputSamples"
        valueFrom:
          $(self[0])
      bamFile:
        source: "#read_tsv/inputSamples"
        valueFrom:
          $(self[1])
      bamIndex:
        source: "#read_tsv/inputSamples"
        valueFrom:
          $(self[2])

    out: [GVCF]

outputs:
  GVCF:
    type: Any

read_tsv.cwl

#/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool

requirements:
- class: InlineJavascriptRequirement

inputs:
  infile:
    type: File
    inputBinding:
      loadContents: true

outputs:
  inputSamples:
    type: Any

expression: "${var lines = inputs.infile.contents.split('\\n');
               var nblines = lines.length;
               var arrayofarrays = [];
               for (var i = 0; i < nblines; i++) {
                 var line = lines[i].split('\t');
                  arrayofarrays.push(line);
                }


               return { 'inputSamples': arrayofarrays } ;
              }"

read_tsv step runs fine, creating an array of arrays of 3 elements. The problem is that on HaplotypeCallerERC step bamFile and bamIndex inputs receive merely a string which contains filepath and thus don't recognize it as a File. If I try to change expression like that:

expression: "${var lines = inputs.infile.contents.split('\\n');
               var nblines = lines.length;
               var arrayofarrays = [];
               for (var i = 0; i < nblines; i++) {
                 var line = lines[i].split('\t');

                  for (var j=0; j < line.length; j++){
                    if (line[j].startsWith('/')){
                      line[j] = 
                         {
                        'class': 'File',
                        'path': line[j]
                         };
                      }

                  }
                  arrayofarrays.push(line);
                }


               return { 'inputSamples': arrayofarrays } ;
              }"

The expression can not be evaluated and I get an exception.

ValidationException: Anonymous file object must have 'contents' and 'basename' fields.

So what is a way to read a string from a file and treat it like a file in my situation?

cwl common-workflow-language • 3.7k views
ADD COMMENT
0
Entering edit mode

for anyone who finds this from Google, I got this error message;

ValidationException: Anonymous file object must have 'contents' and 'basename' fields.

because my sub-workflow had a default value for an input item, and my top level workflow did not pass in that input item to the sub-workflow. I had to move the sub-workflow's input item to the top level and pass it in from the main workflow.

ADD REPLY
2
Entering edit mode
7.2 years ago
anton.khodak ▴ 10

Changing "path" to "location" and adding "file://" at the beginning of the file path in the last Javascript expression did the trick

         for (var j=0; j < line.length; j++){
                if (line[j].startsWith('/')){
                  line[j] = 
                     {
                    'class': 'File',
                    'path': line[j]
                     };
                  }

to

              for (var j=0; j < line.length; j++){
                if (line[j].startsWith('/')){
                  line[j] = 
                     {
                    'class': 'File',
                    'location': 'file://' + line[j]
                     };
                  }

              }
ADD COMMENT
0
Entering edit mode
7.3 years ago

This won't work, because local Files not declared as inputs aren't available to the Expression Tool in CWL v1.0 (hence the message about anonymous File objects).

As a workaround you can have a standalone ExpressionTool that converts the input TSV into a CWL input object.

ADD COMMENT
0
Entering edit mode

Could you please elaborate on the workaround? I don't quite understand what else an ExpressionTool can do here. Can I declare inputs and not provide them in the job, but populate them in the runtime?

ADD REPLY

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6