Question: CWL to check the output directory and run for non-existing files
0
gravatar for a.james
5 months ago by
a.james180
Germany
a.james180 wrote:

Hello All,

I have a CWL script which should merge the graphs files produced from the previous step. I need t=CWL to check the output directory and merge those graphs. My CWL script looks like following, . The input is an array of BAM files.

  1. I need CWL command line tool to go check the existing output directory and execute step to merge all the already generated files within the output directory. But now it is not doing it rather it is starting from the begin, that is , from the step to generate each graph for each BAM file. Which is processing while time consuming.

    cwlVersion: v1.0 class: CommandLineTool doc: Spladder

    baseCommand: [python2.7, /usr/python/spladder.py]
    
    hints:
      cwltool:InplaceUpdateRequirement:
        inplaceUpdate: true
    requirements:
     - class: InlineJavascriptRequirement
     - class: InitialWorkDirRequirement
       listing: 
        - entry: "$({class: 'Directory', listing: []})"
          entryname: $(inputs.spladder_outDir)
          writable: true
    
    inputs:
     spladder_gtf: 
      type: File
      inputBinding:
       position: 3
       prefix: -a
     spladder_bams: 
      type: File[]
      inputBinding:
       position: 1
       prefix: -b
      secondaryFiles: .bai
     spladder_outDir:
      type: string
      inputBinding:
       position: 2
       prefix: -o
     spladder_phase2:
      type: string
      inputBinding:
       position: 6
       prefix: -T
     spladder_merge_graphs:
      type: string
      inputBinding:
        position: 5
        prefix: -M
     spladder_primary_alignment:
      type: string
      inputBinding:
        position: 10
        prefix: -P
     spladder_confidence:
      type: int
      inputBinding:
        position: 4
        prefix: -c
     spladder_alt:
      type: string
      inputBinding:
        position: 7
        prefix: -t
     spladder_validate:
      type: string
      inputBinding:
        position: 8
        prefix: -V
     spladder_RL:
      type: int
      inputBinding:
        position: 9
        prefix: -n
    
    outputs:
     spladder_out:
      type: Directory
      outputBinding:
       glob: $(inputs.spladder_outDir)/spladder
    
    $namespaces:
      cwltool: http://commonwl.org/cwltool#
    

    And the YML file used for the above script looks like following,

    spladder_gtf: 
     class: File
     path: /usage_examples/gencode.v19.annotation.hs37d5_chr.spladder.gtf
    spladder_outDir:/Alignment/spladder_out/
    spladder_out_dir1: /spladder_out1
    spladder_out_dir2: /spladder_out2
    spladder_bams: [
     {class: File, path: /Alignment/C3N-02289_10_L1Aligned.sortedByCoord.out.bam},
     {class: File, path: /Alignment/C3N-02289_4_5_L1Aligned.sortedByCoord.out.bam},
     {class: File, path: /cluster/work/grlab/projects/alva_temp/Alignment/C3N-02671_08_L1Aligned.sortedByCoord.out.bam}
    ]
    spladder_confidence: 2
    spladder_merge_graphs: merge_graphs
    spladder_alt: alt_3prime
    spladder_RL: 100
    spladder_phase2: y
    spladder_primary_alignment: y
    

And I ran the cal tool as,

 cwltool --enable-ext /spladder_part1.cwl /part2.yml

Now my aim is that the CWL tool looks into spladder_outDir and just merge the existing outputs from the previous run/step. Currently the spladder_outDir has 17 graph files and I need CWL to merge them together. As in the parameter spladder_merge_graphs: But on contrary the CWL is staring from the beginning creating all graphs if no absolute path is given if an absolute is given then it says,

FileExistsError: [Errno 17] File exists: '/spladder_out/spladder'

if not then,

WARNING: Output directory ./spladder_out does not exist - will be created

Any helps or suggestion would be great I read the CWL Manuel end-to end couple of times I saw

cwltool:InplaceUpdateRequirement:
    inplaceUpdate: true

and --enable-ext both of them are providing the right the right solution

If I run it otherwise then the processing time is three times more. That why I wanted to do the merging part as second separate run.

rna-seq cwl next-gen • 313 views
ADD COMMENTlink modified 19 days ago • written 5 months ago by a.james180
1
gravatar for Tom
12 weeks ago by
Tom210
Bielefeld University, CeBiTec, Germany
Tom210 wrote:

Hi! If your problem still exists i would very much like to help. However, i am not sure if i understood what your tool is supposed to do. Probably because i don't know anything about spladder. Is it correct that the "previous step" you mentioned is part of a workflow and the Tool you posted here only has the purpose of merging the files?

I am by no means an expert in CWL. That being said, i am not sure InitialWorkdirRequirement can be used in the way you you are attempting for this tool.

You might instead try giving subdirectories of runtime.outdir (the temporary output directory cwl uses during runtime) to spladder as input parameters for its output directory. That way you still know exactly where your files are during runtime, so you can catch the ones you need with glob. This might look like:

[...]
requirements:
 - class: InlineJavascriptRequirement

arguments:
  -  valueFrom: $(runtime.outdir+"/spladder_output")
     prefix: -o
     position: 2

inputs:
[...]
REMOVE spladder_outDir FROM INPUTS
[...]
outputs:
 spladder_out:
  type: Directory
  outputBinding:
   glob: $(runtime.outdir+"/spladder_output")
[...]

I don't know how the output of spladder will look. Let's say its a bunch of ".example"-files, which spladder puts into a subdirectory called "blurb". Then you might alternatively catch the output as an array of files using.

outputs:
  spladder_out:
    type: File[]
    outputBinding:
      glob: $(runtime.outdir+"/spladder_output/blurb/*.example")

Please write if this still produces problems or if i misunderstood the issue altogether. Regards, Tom

ADD COMMENTlink written 12 weeks ago by Tom210
1

@Tom Thanks for your time and reply. I will take a look into your solution. I tried this solution, but it is not giving out what I need

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by a.james180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 794 users visited in the last hour