Capture Output files using glob
1
0
Entering edit mode
5.9 years ago
ttom ▴ 220

I have a python script to be run in CWL.

The script creates a directory named splad inside the outputDirectory specified in CWL file and would have result files inside that directory.

Here I am trying to capture those files using the glob syntax

glob: $(inputs.splad_outDir)/splad/*

How ever the following code does not work

cat splad.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [python, python/splad.py]

requirements:
  - class: InlineJavascriptRequirement

inputs:
 splad_gtf:
  type: File
  inputBinding:
   position: 1
   prefix: -a
 splad_bams:
  type: File[]
  inputBinding:
   position: 2
   prefix: -b
   itemSeparator: ","
 splad_outDir:
  type: Directory
  inputBinding:
   position: 3
   prefix: -o
 splad_phase2:
  type: string
  inputBinding:
   position: 4
   prefix: -T

outputs:
 splad_out:
  type: File[]
  outputBinding:
   glob: $(inputs.splad_outDir)/splad/*

cat splad.yml

splad_gtf:
        class: File
        path: gencode.v19.annotation.hs37d5_chr.gtf

splad_outDir: 
        class: Directory
        location: spladder_TEST

spladder_bams: [
        {class: File, path: sampleA.bam},
        {class: File, path: sampleB.bam}
        ] 

splad_phase2: y
CWL • 2.2k views
ADD COMMENT
0
Entering edit mode

In this case the output files will be put in spladder_TEST/splad?

ADD REPLY
0
Entering edit mode

Yes, the script/program creates additional directory splad inside the output directory you specify and keeps the output files there. So here in this case spladder_TEST/splad

ADD REPLY
1
Entering edit mode
5.9 years ago
biokcb ▴ 170

Ok, so a couple modifications you can try out:

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [python, python/splad.py]

requirements:
  - class: InitialWorkDirRequirement
    listing: $(inputs.splad_outDir)
    writable: true

inputs:
 splad_gtf:
  type: File
  inputBinding:
   position: 1
   prefix: -a
 splad_bams:
  type: File[]
  inputBinding:
   position: 2
   prefix: -b
   itemSeparator: ","
 splad_outDir:
  type: Directory
  inputBinding:
   position: 3
   prefix: -o
 splad_phase2:
  type: string
  inputBinding:
   position: 4
   prefix: -T

outputs:
 splad_out_dir:
  type: Directory
  outputBinding:
   glob: $(inputs.splad_outDir.basename)/splad

I added InitialWorkDirRequirement section which allows you to write files to an input directory (splad_outDir) and changed the output collected to a Directory type that follows the pattern above. You need to specify that it used the directory's basename attribute, as just using $(inputs.splad_outDir) is an object. This should still collect the whole output set of files as well, but if you want to also explicitly specify the files that are in the directory to be captured as output you can add that too under outputs, but I'm not sure what you'd glob for specifically since I don't know your file types, so replace the "*" accordingly if needed.

 splad_out_files:
  type: File[]
  outputBinding:
   glob: $(inputs.splad_outDir.basename)/splad/*

Note that if you don't capture the directory as an output as well, it places your files into the runtime output directory (which could be the current directory you run the script from). So if you also need splad_outDir, you may want to specify that directory should be captured too. Let me know if this doesn't work and we can modify it a bit!

ADD COMMENT
0
Entering edit mode

Sorry for the delay in getting back..

When I try this, all my output files are kept in the working directory from where I give the CWL run and not to spladder_TEST/splad

 splad_out_files:
  type: File[]
  outputBinding:
   glob: $(inputs.splad_outDir.basename)/splad/*

And when I try this, the output files are written to a directory names splad folder and not to spladder_TEST/splad

 outputs:
     splad_out_dir:
      type: Directory
      outputBinding:
       glob: $(inputs.splad_outDir.basename)/splad
ADD REPLY
0
Entering edit mode

Can you try adding another directory output?

 outputs:
   splad_out_dir1:
     type: Directory
     outputBinding:
       glob: $(inputs.splad_outDir.basename)/splad
   splad_out_dir2:
     type: Directory
     outputBinding:
       glob: $(inputs.splad_outDir.basename)

I think there might be a related issue in cwltool for nested directories, but they don't seem to collect nested directories automatically. I'm not sure if this is the best way to specify this. I think the recommended method is to produce the outputs then use an ExpressionTool to organize them. If this doesn't work or if you end up needing a more complex directory structure, you should probably switch to this, but for something simple like this I think it may work.

ADD REPLY
0
Entering edit mode

This case is working

ADD REPLY
0
Entering edit mode

One more question on this. With the given below lines in the code, splad_outDir has to be existing for this to work, but will create splad_outDir/splad

requirements:
 - class: InitialWorkDirRequirement
   listing:
    - entry: $(inputs.splad_outDir)
      writable: true

In order to create splad_outDir, gave the following lines, but it still says, no such file or directory

requirements:
 - class: InlineJavascriptRequirement
 - class: InitialWorkDirRequirement
   listing:
    - entry: "$({class: 'Directory', listing: []})"
      entryname: $(inputs.splad_outDir)
      writable: true
ADD REPLY
1
Entering edit mode

I think you will need to go to the inputs section and change $(inputs.splad_outDir) from type: Directory to type: string for the script to set up a directory for you that does not exist.

ADD REPLY
0
Entering edit mode

Thank you, it worked

ADD REPLY

Login before adding your answer.

Traffic: 2809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6