Tutorial: How To Create A Bioinformatics Pipeline Using Spotify !
24
gravatar for Rad
4.0 years ago by
Rad790
Canada
Rad790 wrote:

A lot of big streaming companies like Spotify, Pandora, etc are more and more pushing towards a better and more stable frameworks and the best thing to do that is to go open source and get useful feedbacks and keep working on making the platform better.

Spotify developed a platform called Luigi, a python framework to handle users logs, and mine them intuitively by plugging several machine learning algorithms to improve their recommendation systems and their suggestions to clients.

Luigi works almost like any make-like python framework for pipeline development, like Ruffus or Snakemake etc.., but it has a plus over these solutions, it is designed to create Hadoop friendly pipelines and also comes with a visual diagnostic of each part of your pipeline while it is running. Another feature I like is that it notifies you via email when a task fails.

Here is a simple adaptation of Luigi for Bioinformatics. This pipeline :

  • takes a fastq samples list,
  • align them via bwa_mem
  • Convert Sam files to Bam files
  • Sort the bam files
  • Index them
  • Call variants using samtools mpileup
  • Convert bcf to vcf

Code is viewable and editable here http://coderscrowd.com/app/public/codes/view/229

ADD COMMENTlink modified 2.2 years ago by ostrokach260 • written 4.0 years ago by Rad790

why its showing error ? how to sort out.

CMD python Luigipipeline.py --local-scheduler Custom_Genome_Pipeline

====== Running BWA ====== INFO: [pid 18146] Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) done Bwa_Mem(sample=SRR098401testsickle_1.fastq) DEBUG: 1 running tasks, waiting for next task to finish INFO: Informed scheduler that task Bwa_Mem_SRR098401testsic_c845cfe3a6 has status DONE DEBUG: Asking scheduler for work... DEBUG: Pending tasks: 11 INFO: [pid 18146] Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) running Convert_Sam_Bam(sample=SRR098401testsickle_1.fastq) ERROR: [pid 18146] Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) failed Convert_Sam_Bam(sample=SRR098401testsickle_1.fastq) Traceback (most recent call last): File "/home/likith_reddy/.local/lib/python2.7/site-packages/luigi/worker.py", line 194, in run new_deps = self._run_get_new_deps() File "/home/likith_reddy/.local/lib/python2.7/site-packages/luigi/worker.py", line 131, in _run_get_new_deps task_gen = self.task.run() File "Luigipipeline.py", line 60, in run "sam/"+self.sample+".sam"]) File "Luigipipeline.py", line 15, in run_cmd p = subprocess.Popen(cmd, shell=False, universal_newlines=True, stdout=subprocess.PIPE) File "/usr/lib/python2.7/subprocess.py", line 711, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory DEBUG: 1 running tasks, waiting for next task to finish INFO: Informed scheduler that task Convert_Sam_Bam_SRR098401testsic_c845cfe3a6 has status FAILED DEBUG: Asking scheduler for work... DEBUG: Done DEBUG: There are no more tasks to run at this time DEBUG: There are 11 pending tasks possibly being run by other workers DEBUG: There are 11 pending tasks unique to this worker DEBUG: There are 11 pending tasks last scheduled by this worker INFO: Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) was stopped. Shutting down Keep-Alive thread INFO: ===== Luigi Execution Summary =====

Scheduled 13 tasks of which:

2 ran successfully:
    2 Bwa_Mem(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
2 failed:
    2 Convert_Sam_Bam(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
9 were left pending, among these:
    9 had failed dependencies:
        2 Call_Variant(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
        2 Convert_Bcf_Vcf(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
        1 Custom_Genome_Pipeline()
        2 Index_Bam(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
        2 Sort_Bam(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

ADD REPLYlink written 6 weeks ago by pinninti1991reddy10

Read: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

Your question makes no sense as you provide neither context nor steps to reproduce what you tried. If you do not invest effort into making it easier for people that volunteer their time to help you, it will be immensely difficult for you to get help in a timely manner.

ADD REPLYlink written 6 weeks ago by Ram14k

Hi, can you provide the Luigi original Code. thanks!

ADD REPLYlink written 6 weeks ago by pinninti1991reddy10
3
gravatar for valentine
4.0 years ago by
valentine30
UK/Cambridge/EMBL-EBI
valentine30 wrote:

Also see Ratatosk, a pipeline framework for bioinformatics tasks built on Luigi: http://ratatosk.readthedocs.org/en/latest/index.html

ADD COMMENTlink written 4.0 years ago by valentine30

Great ! Didn't know about that, thanks for sharing

ADD REPLYlink written 4.0 years ago by Rad790
2
gravatar for Samuel Lampa
3.1 years ago by
Samuel Lampa1.1k
Stockholm
Samuel Lampa1.1k wrote:

For some bioinformatics tasks, the extra boilerplate needed with luigi can be a bit hampering. This lead me to write a little (~50 lines of code) helper library that lets you define inputs and outputs inline in the command pattern, and connect inputs and outputs between tasks using single-assignment syntax, similar to when you set variables.

So for anyone interested in checking it out, the library is available at github: 

... and a somewhat realistic NGS bioinformatics example can be found in this gist: 

And, it has a page here on BioStars too:

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Samuel Lampa1.1k

Just for reference: Luigi's Monkey Wrench is nowadays deprecated in favor of SciLuigi

ADD REPLYlink written 24 months ago by Samuel Lampa1.1k
0
gravatar for ostrokach
2.2 years ago by
ostrokach260
Canada
ostrokach260 wrote:

The best pipelining tool I have used is snakemake. Make can work OK as well (+qmake if you are using SGE).

For more info, see here: Workflow management software for pipeline development in NGS

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by ostrokach260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 933 users visited in the last hour