Question: Input type array and Output type file, possible in CWL ?
0
gravatar for ttom
14 months ago by
ttom210
ttom210 wrote:

Using scatter to run fastqc_check step in the workflow where input is an array. Trying to capture all the results to one file. When the output file is kept as File, only the results from last file in the input array gets stored in the output file.

 fastqc_check_out:
  type: File[]
  outputSource: fastqc_check/fastqc_check_out

Would like to keep all the results from different input files in the input array to one output file When the outputs/fastqc_check_out is kept as a File, it throws error. Is there a way to do it

cat qc.cwl . Workflow

  cwlVersion: v1.0
    class: Workflow
    requirements:
     - class: ScatterFeatureRequirement

inputs:
 reads1:
  type: File[]
 reads2:
  type: File[]
 fastqc_check_script:
  type: File
 sample:
  type: string
outputs:
 fastqc_out:
  type: File[]
  outputSource: fastqc/fastqc_zip
 fastqc_html:
  type: File[]
  outputSource: fastqc/fastqc_html
 fastqc_check_out:
  type: File[]
  outputSource: fastqc_check/fastqc_check_out

steps:
 fastqc:
  run: fastqc.cwl
  in:
   fq1:
    source: reads1
   fq2:
    source: reads2
  out: [fastqc_zip, fastqc_html]
 fastqc_check:
  run: fastqc_check.cwl
  in:
   sample: sample
   fq1_zips: fastqc/fastqc_zip
  scatter: fq1_zips
  out: fastqc_check_out

cat fastqc_check.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [sh, fastqc_check.sh]

inputs:
 sample:
  type: string
  inputBinding:
   position: 1
 fq1_zips:
  type: File
  inputBinding:
   position: 2
outputs:
 fastqc_check_out:
  type: File
  outputBinding:
   glob: $(inputs.sample)_fastqc.summary

cat fastqc.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [fastqc, -o, .]

inputs:
 fq1: 
  type: File[]
  inputBinding:
   position: 1
 fq2:
  type: File[]
  inputBinding:
   position: 2
outputs:
 fastqc_zip:
  type: File[]
  outputBinding:
   glob: '*.zip'
 fastqc_html:
  type: File[]
  outputBinding:
   glob: '*.html'
cwl • 661 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by ttom210
2
gravatar for bogdan.gavrilovic
14 months ago by
bogdan.gavrilovic210 wrote:

Hi, seems to me like you are doing everything right regarding CWL. Possible problem could be that when you scatter fastqc_check.cwl, two jobs are created and each job outputs the file with the same name $(inputs.sample)_fastqc.summary. Try to change the fastqc_check.cwl tool to output some other file name based on the input file name, e.g. $(inputs.fq1_zips.nameroot).$(inputs.sample)_fastqc.summary or something like that, just to make sure that the two files have different names.

If you want to merge summary outputs, then you have to either modify fastqc_check.cwl to take a list and merge outputs in some way or to create a third tool in the end that would merge the output array.

ADD COMMENTlink modified 14 months ago • written 14 months ago by bogdan.gavrilovic210
1

Yes, true. The scatter creates multiple jobs and each job output gets written to different file, although the output file is the same. Each time a file gets created, overwriting the previous one.

As a solution, writing the output of the scatter to different output files and then another step to cat the results of those output files to a single output file.

cat fastqc_check.cwl

class: CommandLineTool
baseCommand: [sh, fastqc_check.sh] 

inputs:
 fq1_zips:
  type: File
  inputBinding:
   position: 1
outputs:
 fastqc_check_out:
  type: File
  outputBinding:
   glob: $(inputs.fq1_zips.nameroot).summary

cat fastqc_summarize.cwl

cwlVersion: v1.0
class: CommandLineTool
baseCommand: [cat]

inputs:
 sample: string
 fq1_summary:
  type: File[]
  inputBinding:
   position: 1
outputs:
 fastqc_summarize_out:
  type: stdout
stdout: $(inputs.sample)_fastqc.summary

cat qc.cwl

cwlVersion: v1.0
class: Workflow


requirements:
 - class: ScatterFeatureRequirement

inputs:
 reads1:
  type: File[]
 reads2:
  type: File[]
 fastqc_check_script:
  type: File
 sample:
  type: string
outputs:
 fastqc_out:
  type: File[]
  outputSource: fastqc/fastqc_zip
 fastqc_html:
  type: File[]
  outputSource: fastqc/fastqc_html
 fastqc_check_out:
  type: File[]
  outputSource: fastqc_check/fastqc_check_out
 fastqc_summary_out:
  type: File
  outputSource: fastqc_summarize/fastqc_summarize_out
steps:
 fastqc:
  run: fastqc.cwl
  in:
   fq1:
    source: reads1
   fq2:
    source: reads2
  out: [fastqc_zip, fastqc_html]
 fastqc_check:
  run: fastqc_check_temp.cwl
  in:
   fq1_zips:
    source: [fastqc/fastqc_zip]
  scatter: fq1_zips
  out: [fastqc_check_out]
 fastqc_summarize:
  run: fastqc_summarize.cwl
  in:
   sample: sample
   fq1_summary:
    source: fastqc_check/fastqc_check_out
  out: [fastqc_summarize_out]
ADD REPLYlink modified 14 months ago • written 14 months ago by ttom210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1203 users visited in the last hour