A lot of big streaming companies like Spotify, Pandora, etc are more and more pushing towards a better and more stable frameworks and the best thing to do that is to go open source and get useful feedbacks and keep working on making the platform better.
Spotify developed a platform called Luigi, a python framework to handle users logs, and mine them intuitively by plugging several machine learning algorithms to improve their recommendation systems and their suggestions to clients.
Luigi works almost like any make-like python framework for pipeline development, like Ruffus or Snakemake etc.., but it has a plus over these solutions, it is designed to create Hadoop friendly pipelines and also comes with a visual diagnostic of each part of your pipeline while it is running. Another feature I like is that it notifies you via email when a task fails.
Here is a simple adaptation of Luigi for Bioinformatics. This pipeline :
- takes a fastq samples list,
- align them via bwa_mem
- Convert Sam files to Bam files
- Sort the bam files
- Index them
- Call variants using samtools mpileup
- Convert bcf to vcf
Code is viewable and editable here http://coderscrowd.com/app/public/codes/view/229