Question: CWL: nameext not returning expected array
0
gravatar for ForrestBear
5 months ago by
ForrestBear30
Curii
ForrestBear30 wrote:

I'm writing an expression tool in CWL to handle getting an array of file names based on extension.

In a directory that contains .bed, .vcf.gz, I'd like to return an array of vcf.gz files. It doesn't seem like file.nameext is working as I expect it to.

class: ExpressionTool
cwlVersion: v1.0
inputs:
  vcfsdir: Directory
outputs:
  samples: string[]
  vcfgzs:
    type: File[]
    secondaryFiles: [.tbi]
  beds: File[]
requirements:
  InlineJavascriptRequirement: {}
expression: |
  ${
    var vcfgzs = [];

    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      if (file.nameext == '.gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};

The return output I'm getting is simply:

{
    "vcfgzs": []
}
expressiontool cwl • 256 views
ADD COMMENTlink modified 5 months ago by Tom430 • written 5 months ago by ForrestBear30

I've confirmed that this bug also exists in the CWL reference runner: https://github.com/common-workflow-language/cwltool/issues/1074

ADD REPLYlink written 5 months ago by Michael R. Crusoe1.6k
0
gravatar for Tom
5 months ago by
Tom430
Bielefeld University, CeBiTec, Germany
Tom430 wrote:

Accessing file.nameext seems to not work in the context of the expression. No idea why. I made a simple workaround that will probably do the trick as long as your filenames only contain one dot.

expression: |
  ${
    var vcfgzs = [];
    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      var filenameext = inputs.vcfsdir.listing[i].basename.split('.')[1];
      if (filenameext == 'gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};
    }
ADD COMMENTlink modified 5 months ago • written 5 months ago by Tom430
1

I did try this and still got empty arrays. I'm wondering if it might be a bug.

ADD REPLYlink written 5 months ago by ForrestBear30
1

It is, thanks for finding and reporting this! https://github.com/common-workflow-language/cwltool/issues/1074

ADD REPLYlink written 5 months ago by Michael R. Crusoe1.6k

Okay, looks like you were right about it being a bug! I ran the code before posting it here, but did not use subdirectories in the input directory. Sorry! Only the .gz-files in the parent directory get returned it seems. I guess you have to use a CommandLineTool to circumvent the problem for now.

ADD REPLYlink modified 5 months ago • written 5 months ago by Tom430

The workaround does work for me with the most recent cwltool release:

Setup:

$ mkdir -p test; touch test/one.gz test/two.gz

biostars_365953-workaround.cwl

class: ExpressionTool
cwlVersion: v1.0
inputs:
  vcfsdir: Directory
outputs:
  samples: string[]
  vcfgzs:
    type: File[]
    secondaryFiles: [.tbi]
  beds: File[]
requirements:
  InlineJavascriptRequirement: {}
expression: |
  ${
    var vcfgzs = [];
    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      var filenameext = inputs.vcfsdir.listing[i].basename.split('.')[1];
      if (filenameext == 'gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};
    }

result

$ cwltool biostars_365953-workaround.cwl --vcfsdir test
/home/michael/cwltool/env3/bin/cwltool 1.0.20181217162649
Resolved 'biostars_365953-workaround.cwl' to 'file:///home/michael/cwltool/biostars_365953-workaround.cwl'
{
    "vcfgzs": [
        {
            "class": "File",
            "location": "file:///home/michael/cwltool/two.gz",
            "basename": "two.gz",
            "size": 0,
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/two.gz"
        },
        {
            "class": "File",
            "location": "file:///home/michael/cwltool/one.gz",
            "basename": "one.gz",
            "size": 0,
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/one.gz"
        }
    ]
}
Final process status is success
ADD REPLYlink modified 5 months ago • written 5 months ago by Michael R. Crusoe1.6k

Using cwltool 1.0.20190228155703 and the biostars_365953-workaround.cwl, i still don't get files from subdirectories returned. So the behaviour seems identical to the previous version, at least in case of the workaround.

Test:

$ mkdir indir
$ mkdir indir/subdir
$ touch indir/cat.gz
$ touch indir/subdir/dog.gz
$ cwltool workaround.cwl --vcfsdir indir

Output:

{
        "vcfgzs": [
            {
                "class": "File",
                "location": "file:///mnt/masse/tests/biostars/ForrestBear/cat.gz",
                "basename": "cat.gz",
                "size": 0,
                "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
                "path": "/mnt/masse/tests/biostars/ForrestBear/cat.gz"
            }
        ]
    }
ADD REPLYlink modified 4 months ago • written 4 months ago by Tom430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 748 users visited in the last hour