Question: CWL: nameext not returning expected array
0
gravatar for ForrestBear
8 weeks ago by
ForrestBear10
Curii
ForrestBear10 wrote:

I'm writing an expression tool in CWL to handle getting an array of file names based on extension.

In a directory that contains .bed, .vcf.gz, I'd like to return an array of vcf.gz files. It doesn't seem like file.nameext is working as I expect it to.

class: ExpressionTool
cwlVersion: v1.0
inputs:
  vcfsdir: Directory
outputs:
  samples: string[]
  vcfgzs:
    type: File[]
    secondaryFiles: [.tbi]
  beds: File[]
requirements:
  InlineJavascriptRequirement: {}
expression: |
  ${
    var vcfgzs = [];

    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      if (file.nameext == '.gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};

The return output I'm getting is simply:

{
    "vcfgzs": []
}
expressiontool cwl • 165 views
ADD COMMENTlink modified 7 weeks ago by Tom210 • written 8 weeks ago by ForrestBear10

I've confirmed that this bug also exists in the CWL reference runner: https://github.com/common-workflow-language/cwltool/issues/1074

ADD REPLYlink written 7 weeks ago by Michael R. Crusoe1.4k
0
gravatar for Tom
7 weeks ago by
Tom210
Bielefeld University, CeBiTec, Germany
Tom210 wrote:

Accessing file.nameext seems to not work in the context of the expression. No idea why. I made a simple workaround that will probably do the trick as long as your filenames only contain one dot.

expression: |
  ${
    var vcfgzs = [];
    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      var filenameext = inputs.vcfsdir.listing[i].basename.split('.')[1];
      if (filenameext == 'gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};
    }
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Tom210
1

I did try this and still got empty arrays. I'm wondering if it might be a bug.

ADD REPLYlink written 7 weeks ago by ForrestBear10
1

It is, thanks for finding and reporting this! https://github.com/common-workflow-language/cwltool/issues/1074

ADD REPLYlink written 7 weeks ago by Michael R. Crusoe1.4k

Okay, looks like you were right about it being a bug! I ran the code before posting it here, but did not use subdirectories in the input directory. Sorry! Only the .gz-files in the parent directory get returned it seems. I guess you have to use a CommandLineTool to circumvent the problem for now.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Tom210

The workaround does work for me with the most recent cwltool release:

Setup:

$ mkdir -p test; touch test/one.gz test/two.gz

biostars_365953-workaround.cwl

class: ExpressionTool
cwlVersion: v1.0
inputs:
  vcfsdir: Directory
outputs:
  samples: string[]
  vcfgzs:
    type: File[]
    secondaryFiles: [.tbi]
  beds: File[]
requirements:
  InlineJavascriptRequirement: {}
expression: |
  ${
    var vcfgzs = [];
    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      var filenameext = inputs.vcfsdir.listing[i].basename.split('.')[1];
      if (filenameext == 'gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};
    }

result

$ cwltool biostars_365953-workaround.cwl --vcfsdir test
/home/michael/cwltool/env3/bin/cwltool 1.0.20181217162649
Resolved 'biostars_365953-workaround.cwl' to 'file:///home/michael/cwltool/biostars_365953-workaround.cwl'
{
    "vcfgzs": [
        {
            "class": "File",
            "location": "file:///home/michael/cwltool/two.gz",
            "basename": "two.gz",
            "size": 0,
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/two.gz"
        },
        {
            "class": "File",
            "location": "file:///home/michael/cwltool/one.gz",
            "basename": "one.gz",
            "size": 0,
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/one.gz"
        }
    ]
}
Final process status is success
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Michael R. Crusoe1.4k

Using cwltool 1.0.20190228155703 and the biostars_365953-workaround.cwl, i still don't get files from subdirectories returned. So the behaviour seems identical to the previous version, at least in case of the workaround.

Test:

$ mkdir indir
$ mkdir indir/subdir
$ touch indir/cat.gz
$ touch indir/subdir/dog.gz
$ cwltool workaround.cwl --vcfsdir indir

Output:

{
        "vcfgzs": [
            {
                "class": "File",
                "location": "file:///mnt/masse/tests/biostars/ForrestBear/cat.gz",
                "basename": "cat.gz",
                "size": 0,
                "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
                "path": "/mnt/masse/tests/biostars/ForrestBear/cat.gz"
            }
        ]
    }
ADD REPLYlink modified 28 days ago • written 28 days ago by Tom210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1734 users visited in the last hour