Tutorial: How To Create A Bioinformatics Pipeline Using Spotify !
gravatar for Rad
3.9 years ago by
Rad790 wrote:

A lot of big streaming companies like Spotify, Pandora, etc are more and more pushing towards a better and more stable frameworks and the best thing to do that is to go open source and get useful feedbacks and keep working on making the platform better.

Spotify developed a platform called Luigi, a python framework to handle users logs, and mine them intuitively by plugging several machine learning algorithms to improve their recommendation systems and their suggestions to clients.

Luigi works almost like any make-like python framework for pipeline development, like Ruffus or Snakemake etc.., but it has a plus over these solutions, it is designed to create Hadoop friendly pipelines and also comes with a visual diagnostic of each part of your pipeline while it is running. Another feature I like is that it notifies you via email when a task fails.

Here is a simple adaptation of Luigi for Bioinformatics. This pipeline :

  • takes a fastq samples list,
  • align them via bwa_mem
  • Convert Sam files to Bam files
  • Sort the bam files
  • Index them
  • Call variants using samtools mpileup
  • Convert bcf to vcf

Code is viewable and editable here http://coderscrowd.com/app/public/codes/view/229

ADD COMMENTlink modified 2.1 years ago by ostrokach260 • written 3.9 years ago by Rad790
gravatar for valentine
3.9 years ago by
valentine30 wrote:

Also see Ratatosk, a pipeline framework for bioinformatics tasks built on Luigi: http://ratatosk.readthedocs.org/en/latest/index.html

ADD COMMENTlink written 3.9 years ago by valentine30

Great ! Didn't know about that, thanks for sharing

ADD REPLYlink written 3.9 years ago by Rad790
gravatar for Samuel Lampa
3.0 years ago by
Samuel Lampa1.1k
Samuel Lampa1.1k wrote:

For some bioinformatics tasks, the extra boilerplate needed with luigi can be a bit hampering. This lead me to write a little (~50 lines of code) helper library that lets you define inputs and outputs inline in the command pattern, and connect inputs and outputs between tasks using single-assignment syntax, similar to when you set variables.

So for anyone interested in checking it out, the library is available at github: 

... and a somewhat realistic NGS bioinformatics example can be found in this gist: 

And, it has a page here on BioStars too:

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Samuel Lampa1.1k

Just for reference: Luigi's Monkey Wrench is nowadays deprecated in favor of SciLuigi

ADD REPLYlink written 22 months ago by Samuel Lampa1.1k
gravatar for ostrokach
2.1 years ago by
ostrokach260 wrote:

The best pipelining tool I have used is snakemake. Make can work OK as well (+qmake if you are using SGE).

For more info, see here: Workflow management software for pipeline development in NGS

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by ostrokach260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 586 users visited in the last hour