Hi, I'm doing my first baby steps with CWL and I was wondering if it is possible to annotate the formats of the files specified in a tool/workflow specification in order to catch subtle errors.
An example:
inputs:
normal_bam:
type: File
format:
type: bam
assert:
source: normal
patient_id: $(inputs.patient_id)
secondaryFiles: .bai
inputBinding:
prefix: -I:normal
tumor_bam:
type: File
format:
type: bam
assert:
source: tumor
patient_id: $(inputs.patient_id)
secondaryFiles: .bai
inputBinding:
prefix: -I:tumor
reference:
type: File
format:
type: fasta
assert:
content: genome
oraganism: homo_sapiens
secondaryFiles: [.fai, ^.dict]
inputBinding:
prefix: --reference_sequence
patient_id:
type: string
in this case CWL could easily catch errors like passing a BAM file containing a tumor sample instead than one containing a normal sample.
I think that this could be implemented partly by extendind an ontology but I see this becoming tedious if you have for example to generate all possible combinations of content and organism in the fasta field (but maybe I'm wrong not being expert in ontologies at all). Moreover this could be a dynamic feature where attributes are passed as inputs and/or attached to outputs and passed down in the inputs of other workflows/tools as a pipeline is run.
Thanks!
Edit: revised the title to better explain the concept/idea