Question: CWL: outputBindings with secondaryFiles - actually dockstore issue
1
gravatar for kr2
4.0 years ago by
kr250
kr250 wrote:

Hi,

Have a CWL defined that needs to return a set of files (all filename extensions) so I have defined my outputs as:

outputs:
  mapped_out:
    type: File
    outputBinding:
      glob: $(inputs.sample).bam
    secondaryFiles:
      - .bai
      - .bas
      - .md5
      - .met
      - .maptime

I've tried a couple of variations of the json:

{
  ...
  "mapped_out": {
    "path": "/tmp/mapped.bam",
    "class": "File"
  },
  ...
}

Yeilded one file provisioned to /tmp/mapped.bam

This version (based on alea-createGenome.cwl & alea-alignReads-job.json) didn't stage anything:

{
  ...
  "mapped_out": "/tmp/mapped",
  ...
}

Everything seems to have compelted in the cwltool side:

Final process status is success
{
    "mapped_out": {
        "checksum": "sha1$53bb0c4abb07013393891cb50a3feec4c6381304", 
        "basename": "insilico_21.bam", 
        "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam", 
        "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam", 
        "secondaryFiles": [
            {
                "checksum": "sha1$ef6f2cf70e11d7d0be17b79dfb02eb1277e43b41", 
                "basename": "insilico_21.bam.bai", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bai", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bai", 
                "class": "File", 
                "size": 1370120
            }, 
            {
                "checksum": "sha1$4bf5068040c0e2a350aa21fa299f6567230bfbeb", 
                "basename": "insilico_21.bam.bas", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bas", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.bas", 
                "class": "File", 
                "size": 1973
            }, 
            {
                "checksum": "sha1$4a60424144f5283c4e9cf74deb214597cac8bae8", 
                "basename": "insilico_21.bam.md5", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.md5", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.md5", 
                "class": "File", 
                "size": 32
            }, 
            {
                "checksum": "sha1$63139bed16686c6be0dd5469342af1dac8795260", 
                "basename": "insilico_21.bam.met", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.met", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.met", 
                "class": "File", 
                "size": 1521
            }, 
            {
                "checksum": "sha1$39f641f432b510034fb96b3e73569f5fc1824521", 
                "basename": "insilico_21.bam.maptime", 
                "location": "file:///home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.maptime", 
                "path": "/home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.maptime", 
                "class": "File", 
                "size": 279
            }
        ], 
        "class": "File", 
        "size": 42245405
    }
}

Any help gratefully recieved.

Thanks, Keiran

dockstore • 1.7k views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by kr250

Is $(inputs.sample) a string?

Why are you writing the JSON manually? Is the tool itself CWL aware and producing a cwl.output.json file?

ADD REPLYlink written 4.0 years ago by Michael R. Crusoe1.8k

I'm attempting to complete the input json file. Dockstore gives the following template:

$ dockstore tool convert entry2json --entry quay.io/wtsicgp/dockstore-cgpmap:1.0.2
{
  "reference": {
    "path": "fill me in",
    "class": "File"
  },
  "bams_in": "fill me in",
  "cram": false,
  "mapped_out": {
    "path": "fill me in",
    "class": "File"
  },
  "bwa": " -Y -K 100000000",
  "bwa_idx": {
    "path": "fill me in",
    "class": "File"
  },
  "sample": "fill me in",
  "scramble": ""
}

Can I do the same with cwltool? I can't see any options indicating this.

ADD REPLYlink written 4.0 years ago by kr250

Sure, but you asked a question about the outputs section :-)

ADD REPLYlink written 4.0 years ago by Michael R. Crusoe1.8k

Is the "sure" here in reference to the question as to whether cwltool can generate an input json?

ADD REPLYlink written 4.0 years ago by denis.yuen100

Yes, inputs.sample is a string:

https://github.com/cancerit/dockstore-cgpmap/blob/master/Dockstore.cwl

ADD REPLYlink written 4.0 years ago by kr250

[deleted, accidentally posted as comment]

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by denis.yuen100
3
gravatar for denis.yuen
4.0 years ago by
denis.yuen100
OICR
denis.yuen100 wrote:

Ah, I think I understand the confusion here. Apologies since I think we created it.

1) Dockstore input JSON can (optionally) include output parameters in order to provision files to locations like S3, icgc-storage, ftp. This is an artifact of Dockstore's beginnings in the pan-cancer project where we always wrote workflows that look like "download from GNOS/S3 -> do processing -> upload to GNOS/S3"

In other words, you should be able to do this to upload bamstats_report, an output to s3:

$ cat sample_configs.json 
{
  "bam_input": {
        "class": "File",
        "path": "https://s3.amazonaws.com/oconnor-test-bucket/sample-data/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam"
    },
    "bamstats_report": {
        "class": "File",
        "path": "s3://oicr.temp/bamstats.zip"
    }
}
dockstore tool launch --entry quay.io/collaboratory/dockstore-tool-bamstats:1.25-6_1.0  --json sample_configs.json

And you should be able to do this to just leave the results in place on your local host

$ cat sample_configs2.json
{
  "bam_input": {
        "class": "File",
        "path": "https://s3.amazonaws.com/oconnor-test-bucket/sample-data/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam"
    }
}
$ dockstore tool launch --entry quay.io/collaboratory/dockstore-tool-bamstats:1.25-6_1.0  --json sample_configs2.json

This is a red herring though.

2) It looks like Dockstore has a bug/missing feature where we probably missed that output parameters (in the CWL) can also specify secondary files. While the secondary files look like they're being generated properly coming out of cwltool (in /home/ubuntu/./datastore/launcher-ccd381b4-c475-4770-b88b-bebd2b06439c/outputs/insilico_21.bam.) , they aren't being moved further along to /tmp/mapped. as we would have expected.

We're adding this as an issue https://github.com/ga4gh/dockstore/issues/544

ADD COMMENTlink written 4.0 years ago by denis.yuen100
0
gravatar for Michael R. Crusoe
4.0 years ago by
Common Workflow Language project
Michael R. Crusoe1.8k wrote:

Edited to add: This appears to be a dockstore specific problem, you should contact them

If you want to provide an input file with secondaryFiles copy the general format in your last code block. The checksum, location, size, and basename fields don't need to be provided.

Here is a clean YAMLy version using relative paths

mapped_out:
    class: File
    path: insilico_21.bam
    secondaryFiles
        - class: File
          path: insilico_21.bam.bai
        - class: File
          path: insilico_21.bam.bas
        - class: File
          path: insilico_21.bam.md5 
        - class: File
          path: insilico_21.bam.met
        - class: File
          path: insilico_21.bam.maptime
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Michael R. Crusoe1.8k

Hmm, I think this may be a dockstore issue as it 'massages' the initial json before handing to cwltool:

Original as recommended:

$ json_pp < Dockstore3.json 
{
   "reference" : {
      "class" : "File",
      "path" : "/tmp/core_ref_GRCh37d5.tar.gz"
   },
   "bwa_idx" : {
      "path" : "/tmp/bwa_idx_GRCh37d5.tar.gz",
      "class" : "File"
   },
   "bams_in" : [
      {
         "class" : "File",
         "path" : "/tmp/insilico_21.bam"
      }
   ],
   "mapped_out" : {
      "path" : "/tmp/mapped.bam",
      "class" : "File",
      "secondaryFiles" : [
         {
            "path" : "/tmp/mapped.bam.bai",
            "class" : "File"
         },
         {
            "path" : "/tmp/mapped.bam.bas",
            "class" : "File"
         },
         {
            "path" : "/tmp/mapped.bam.md5",
            "class" : "File"
         },
         {
            "class" : "File",
            "path" : "/tmp/mapped.bam.met"
         },
         {
            "path" : "/tmp/mapped.bam.maptime",
            "class" : "File"
         }
      ]
   },
   "sample" : "insilico_21",
   "cram" : false
}

What is passed out of dockstore to cwltool:

$ json_pp < /home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/workflow_params.json
{
   "cram" : false,
   "bwa_idx" : {
      "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/inputs/adf6a2b5-3006-42ce-a1bb-3d084c8229f3/bwa_idx_GRCh37d5.tar.gz",
      "class" : "File"
   },
   "mapped_out" : {
      "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/outputs/mapped_out",
      "class" : "File"
   },
   "bams_in" : [
      {
         "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/inputs/7a8bb7c9-98c9-4dc3-82bd-131e0d798bff/insilico_21.bam",
         "class" : "File"
      }
   ],
   "reference" : {
      "class" : "File",
      "path" : "/home/ubuntu/./datastore/launcher-accbffba-6eb5-468a-858f-3c578495b467/inputs/6baa8f3d-c46b-4642-8836-5ff8fb5a16b7/core_ref_GRCh37d5.tar.gz"
   },
   "sample" : "insilico_21"
}
ADD REPLYlink written 4.0 years ago by kr250

mapped_out is an output, not an input -- it does not belong in your input document.

ADD REPLYlink written 4.0 years ago by Michael R. Crusoe1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour