Question: Capture Output files using glob
0
gravatar for ttom
26 days ago by
ttom200
ttom200 wrote:

I have a python script to be run in CWL.

The script creates a directory named splad inside the outputDirectory specified in CWL file and would have result files inside that directory.

Here I am trying to capture those files using the glob syntax

glob: $(inputs.splad_outDir)/splad/*

How ever the following code does not work

cat splad.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [python, python/splad.py]

requirements:
  - class: InlineJavascriptRequirement

inputs:
 splad_gtf:
  type: File
  inputBinding:
   position: 1
   prefix: -a
 splad_bams:
  type: File[]
  inputBinding:
   position: 2
   prefix: -b
   itemSeparator: ","
 splad_outDir:
  type: Directory
  inputBinding:
   position: 3
   prefix: -o
 splad_phase2:
  type: string
  inputBinding:
   position: 4
   prefix: -T

outputs:
 splad_out:
  type: File[]
  outputBinding:
   glob: $(inputs.splad_outDir)/splad/*

cat splad.yml

splad_gtf:
        class: File
        path: gencode.v19.annotation.hs37d5_chr.gtf

splad_outDir: 
        class: Directory
        location: spladder_TEST

spladder_bams: [
        {class: File, path: sampleA.bam},
        {class: File, path: sampleB.bam}
        ] 

splad_phase2: y
cwl • 117 views
ADD COMMENTlink modified 26 days ago by biokcb150 • written 26 days ago by ttom200

In this case the output files will be put in spladder_TEST/splad?

ADD REPLYlink modified 26 days ago • written 26 days ago by biokcb150

Yes, the script/program creates additional directory splad inside the output directory you specify and keeps the output files there. So here in this case spladder_TEST/splad

ADD REPLYlink modified 26 days ago • written 26 days ago by ttom200
1
gravatar for biokcb
26 days ago by
biokcb150
biokcb150 wrote:

Ok, so a couple modifications you can try out:

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [python, python/splad.py]

requirements:
  - class: InitialWorkDirRequirement
    listing: $(inputs.splad_outDir)
    writable: true

inputs:
 splad_gtf:
  type: File
  inputBinding:
   position: 1
   prefix: -a
 splad_bams:
  type: File[]
  inputBinding:
   position: 2
   prefix: -b
   itemSeparator: ","
 splad_outDir:
  type: Directory
  inputBinding:
   position: 3
   prefix: -o
 splad_phase2:
  type: string
  inputBinding:
   position: 4
   prefix: -T

outputs:
 splad_out_dir:
  type: Directory
  outputBinding:
   glob: $(inputs.splad_outDir.basename)/splad

I added InitialWorkDirRequirement section which allows you to write files to an input directory (splad_outDir) and changed the output collected to a Directory type that follows the pattern above. You need to specify that it used the directory's basename attribute, as just using $(inputs.splad_outDir) is an object. This should still collect the whole output set of files as well, but if you want to also explicitly specify the files that are in the directory to be captured as output you can add that too under outputs, but I'm not sure what you'd glob for specifically since I don't know your file types, so replace the "*" accordingly if needed.

 splad_out_files:
  type: File[]
  outputBinding:
   glob: $(inputs.splad_outDir.basename)/splad/*

Note that if you don't capture the directory as an output as well, it places your files into the runtime output directory (which could be the current directory you run the script from). So if you also need splad_outDir, you may want to specify that directory should be captured too. Let me know if this doesn't work and we can modify it a bit!

ADD COMMENTlink written 26 days ago by biokcb150

Sorry for the delay in getting back..

When I try this, all my output files are kept in the working directory from where I give the CWL run and not to spladder_TEST/splad

 splad_out_files:
  type: File[]
  outputBinding:
   glob: $(inputs.splad_outDir.basename)/splad/*

And when I try this, the output files are written to a directory names splad folder and not to spladder_TEST/splad

 outputs:
     splad_out_dir:
      type: Directory
      outputBinding:
       glob: $(inputs.splad_outDir.basename)/splad
ADD REPLYlink modified 25 days ago • written 25 days ago by ttom200

Can you try adding another directory output?

 outputs:
   splad_out_dir1:
     type: Directory
     outputBinding:
       glob: $(inputs.splad_outDir.basename)/splad
   splad_out_dir2:
     type: Directory
     outputBinding:
       glob: $(inputs.splad_outDir.basename)

I think there might be a related issue in cwltool for nested directories, but they don't seem to collect nested directories automatically. I'm not sure if this is the best way to specify this. I think the recommended method is to produce the outputs then use an ExpressionTool to organize them. If this doesn't work or if you end up needing a more complex directory structure, you should probably switch to this, but for something simple like this I think it may work.

ADD REPLYlink written 25 days ago by biokcb150

This case is working

ADD REPLYlink modified 24 days ago • written 25 days ago by ttom200

One more question on this. With the given below lines in the code, splad_outDir has to be existing for this to work, but will create splad_outDir/splad

requirements:
 - class: InitialWorkDirRequirement
   listing:
    - entry: $(inputs.splad_outDir)
      writable: true

In order to create splad_outDir, gave the following lines, but it still says, no such file or directory

requirements:
 - class: InlineJavascriptRequirement
 - class: InitialWorkDirRequirement
   listing:
    - entry: "$({class: 'Directory', listing: []})"
      entryname: $(inputs.splad_outDir)
      writable: true
ADD REPLYlink modified 6 days ago • written 6 days ago by ttom200
1

I think you will need to go to the inputs section and change $(inputs.splad_outDir) from type: Directory to type: string for the script to set up a directory for you that does not exist.

ADD REPLYlink written 6 days ago by biokcb150

Thank you, it worked

ADD REPLYlink written 5 days ago by ttom200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1534 users visited in the last hour