Question: CWL: reading files within an expressionTool
2
gravatar for karl.nordstrom
3.6 years ago by
karl.nordstrom90 wrote:

I am trying to convert a csv-file to a set of arrays with an expressionTool and have a piece of javascript that executes as intended when calling:

node javaScript.js

Due to lacking experience with java script I use googled solutions and when executing the script as a part of a cwl-pipeline it crashes. The problematic line is:

var fs = require('fs')

It results in a ReferenceError for require. The reason I have found seems to point toward fs being a server side feature, and I can only guess, but perhaps cwl runs the script as a client-script?

The alternative method I found included FileReader, but that doesn't seem to be part of the node environment.

Is there a correct way of doing this? I'm at a loss...

ADD COMMENTlink modified 3.6 years ago by alaindomissy160 • written 3.6 years ago by karl.nordstrom90
10
gravatar for alaindomissy
3.6 years ago by
alaindomissy160
alaindomissy160 wrote:

The require function is a feature available in nodejs ("server side javascript") to import other javascript modules into the current javascript file.

When using the InlineJavascriptRequirement requirement in a cwl CommanLineTool or in an ExpressionTool, the cwl engine will try to locate a javascript interpreter. If you use cwltool and you have nodejs installed, the javascript code included in your CommanLineTool or ExpressionTool will be passed to nodejs to be executed. However I do not think that such javascript code can include instructions to import other nodejs module by calling the require function.

One way to work around not using the require function, would be to implement the needed processing completely and solely with the javascript code directly included as expression in your CommanLineTool or ExpressionTool.

Here is an example, where you can see a piece of javascript code that takes care of parsing the contents of the csv files into an object with key/values being line numbers and of arrays of strings for each line in the csv

Lets assume this csv file:

data.csv

A,B,C,D
E,F,G,H
I,J,K,L

The cwl job file is:

expression.yaml

#!/usr/bin/env cwltool

cwl:tool: expression.cwl

datafile:
  class: File
  path: data.csv

The expression tool file is:

expression.cwl

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool

requirements:
  - class: InlineJavascriptRequirement

inputs:
  filename:
    type: string
    outputBinding:
      outputEval: $(inputs.datafile.basename)
  filecontent:
    type: string
    outputBinding:
      outputEval: $(inputs.datafile.contents)
  datafile:
    type: File
    inputBinding:
      loadContents: true

outputs:
  processedoutput:
    type: Any

expression: "${var lines = inputs.datafile.contents.split('\\n');
               var nblines = lines.length;
               var arrayofarrays = [];
               var setofarrays = {};
               for (var i = 0; i < nblines; i++) {
                  arrayofarrays.push(lines[i].split(','));
                  setofarrays[i] = lines[i].split(',');}
               return { 'processedoutput': setofarrays } ;
              }"

This will produce the following results:

Final process status is success
{
    "processedoutput": {
        "1": [
            "E", 
            "F", 
            "G", 
            "H"
        ], 
        "0": [
            "A", 
            "B", 
            "C", 
            "D"
        ], 
        "2": [
            "I", 
            "J", 
            "K", 
            "L"
        ]
    }, 
    "filecontent": "A,B,C,D\nE,F,G,H\nI,J,K,L", 
    "filename": "data.csv"
}

The two outputs filename and filecontents are not necessary, but may help with exploring how this works.

The question described desired data structure for the result as a "set of arrays" An example of csv file and result desired might help. As it is I am not sure if "set" was referring to the Set class available in ECMAScript 6 (recent version of javascript). The JSON types available for cwl outputs inlude arrays and objects, so the example show how to convert the csv file content into an object whose property values are arrays of strings, and the keys are the line numbers. If an array of array is desired instead, the code can be changed in the last line by replacing return { 'processedoutput': setofarrays } ; with return { 'processedoutput': arrayofarrays } ;

I hope this helps...

ADD COMMENTlink written 3.6 years ago by alaindomissy160

This solution works very well. I wasn't aware of the loadContent option.

I aimed for something like processedoutput when I spoke of "set of arrays".

Thank you very much.

ADD REPLYlink written 3.6 years ago by karl.nordstrom90

Great example, thank you a lot! Just one question: why are filename and filecontents returned in the body of processedoutput, though you did not push them into this object explicitly?

ADD REPLYlink written 3.4 years ago by anton.khodak0

I would guess that was from an earlier version of the expression that included it for debugging purposes

ADD REPLYlink written 3.1 years ago by Michael R. Crusoe1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1200 users visited in the last hour