We're writing a workflow where we process a couple of bams separately, then run processes with them together (DNAseq somatic). The workflows scatter the tumor/normal for parallel processing, ending up with "tumor.sorted.bam" and "normal.sorted.bam", then gather them for a realignments step in an array with an output of "tumor.realigned.bam" and "normal.realigned.bam". (BAM index files (*.bai) are created and passed along as secondaryFiles for each of these steps here and below as apppropriate)
These bams need some post-processing (resort, dealing with duplicate reads), and I'm able to scatter those bams and get "tumor.realigned.md.bam" and "normal.realigned.md.bam" back as what I believe to be an array of Files.
Here is a snippet from the higher-level CWL that is calling the above:
# lots of stuff above here that works realign: run: commandline/realign.cwl in: bam_files: stageForRealign/bamFiles reference_fasta: referenceFasta targets_bed: captureBed out: [bam_file] post_realign_sort_index_md: run: commandline/bamSortMarkDups.cwl scatter: input_file in: input_file: realign/bam_file out: [bam_file] # Everything works above here: I get the expected BAMs and BAIs # nothing below here works: I suspect this is an issue with my ExpressionTool, # but I don't know a good way to debug collectTN: run: expression/collect_tumor_normal_bams.cwl in: bams: [post_realign_sort_index_md/bam_file] out: [tumorBam, normalBam] coverage_tumor: run: commandline/coverage.cwl in: bam_file: collectTN/tumorBam bed_file: coverageWindows genome_file: bedtoolsGenome out: [counts_file] coverage_normal: run: commandline/coverage.cwl in: bam_file: collectTN/normalBam bed_file: coverageWindows genome_file: bedtoolsGenome out: [counts_file] call_somatic_variants: run: commandline/somatic-caller.cwl in: tumor_bam_file: collectTN/tumorBam normal_bam_file: collectTN/normalBam reference_fasta: referenceFasta regions_bed: captureBed out: [ somatic_caller_output ]
If it makes a difference, we're using the Arvados CWL runner.