Hi, I'm doing my first baby steps with CWL and I was wondering if it is possible to annotate the formats of the files specified in a tool/workflow specification in order to catch subtle errors.
inputs: normal_bam: type: File format: type: bam assert: source: normal patient_id: $(inputs.patient_id) secondaryFiles: .bai inputBinding: prefix: -I:normal tumor_bam: type: File format: type: bam assert: source: tumor patient_id: $(inputs.patient_id) secondaryFiles: .bai inputBinding: prefix: -I:tumor reference: type: File format: type: fasta assert: content: genome oraganism: homo_sapiens secondaryFiles: [.fai, ^.dict] inputBinding: prefix: --reference_sequence patient_id: type: string
in this case CWL could easily catch errors like passing a BAM file containing a tumor sample instead than one containing a normal sample.
I think that this could be implemented partly by extendind an ontology but I see this becoming tedious if you have for example to generate all possible combinations of content and organism in the fasta field (but maybe I'm wrong not being expert in ontologies at all). Moreover this could be a dynamic feature where attributes are passed as inputs and/or attached to outputs and passed down in the inputs of other workflows/tools as a pipeline is run.
Edit: revised the title to better explain the concept/idea