I'm part of a team involve in a project where we will be running a stable analysis pipeline over a large number of samples.
QC(custom scripts) / Mapping(bwa mem) / Variant Calling(GATK Best Practices).
We would like not to reinvent the wheel and build the pipeline using a stablished framework. Ideally this framework is not too focus in this particular pipeline in case we need something else in the future.
I got good information from this previous Biostar's post. This is a summary of options from that post:
- Don't bother, just write a README
- waf (Python)
- SCons (Python)
- Rake (Ruby)
- BioMake, now Skam (Prolog)
- Ruffus (Python)
- Paver (Python)
- Galaxy (Python)
- Snakemake (Python)
Not mentioned in that post but that I'm also looking into:
New options after this post was initially written:
- NGSANE (bash)
- BigDataScript (bds)
- Nextflow (Java)
- Bpipe (Groovy)
- Omics Pipe (Python)
- Cromwell/WDL (Scala)
- Toil (Python)
I would love to get the community opinion on this subject. I'm particular fun right now of Snakemake, gkno and Invoke. I love Snakemake simplicity and how close to the regular make it is. It seems like Invoke is the current winner around the Python community at large. gkno seems like exactly what we need, but I'm worry it could get too complex and hard to maintain.
Latest edit: Added Toil.