Tutorial:How To Create A Bioinformatics Pipeline Using Spotify !
3
26
Entering edit mode
10.5 years ago
Rad ▴ 810

A lot of big streaming companies like Spotify, Pandora, etc are more and more pushing towards a better and more stable frameworks and the best thing to do that is to go open source and get useful feedback and keep working on making the platform better.

Spotify developed a platform called Luigi, a python framework to handle users logs, and mine them intuitively by plugging several machine learning algorithms to improve their recommendation systems and their suggestions to clients.

Luigi works almost like any make-like python framework for pipeline development, like Ruffus or Snakemake etc.., but it has a plus over these solutions, it is designed to create Hadoop friendly pipelines and also comes with a visual diagnostic of each part of your pipeline while it is running. Another feature I like is that it notifies you via email when a task fails.

Here is a simple adaptation of Luigi for Bioinformatics. This pipeline:

  • takes a fastq samples list,
  • align them via bwa_mem
  • Convert SAM files to BAM files
  • Sort the bam files
  • Index them
  • Call variants using samtools mpileup
  • Convert bcf to vcf

Code is viewable and editable here http://coderscrowd.com/app/public/codes/view/229

python pipeline • 11k views
ADD COMMENT
0
Entering edit mode

why its showing error ? how to sort out.

CMD

python Luigipipeline.py --local-scheduler Custom_Genome_Pipeline

====== Running BWA ======
INFO: [pid 18146] Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) done Bwa_Mem(sample=SRR098401testsickle_1.fastq)
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task Bwa_Mem_SRR098401testsic_c845cfe3a6 has status DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 11
INFO: [pid 18146] Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) running Convert_Sam_Bam(sample=SRR098401testsickle_1.fastq)
ERROR: [pid 18146] Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) failed Convert_Sam_Bam(sample=SRR098401testsickle_1.fastq)
Traceback (most recent call last):
File "/home/likith_reddy/.local/lib/python2.7/site-packages/luigi/worker.py", line 194, in run
new_deps = self._run_get_new_deps()
File "/home/likith_reddy/.local/lib/python2.7/site-packages/luigi/worker.py", line 131, in _run_get_new_deps
task_gen = self.task.run()
File "Luigipipeline.py", line 60, in run
"sam/"+self.sample+".sam"])
File "Luigipipeline.py", line 15, in run_cmd
p = subprocess.Popen(cmd, shell=False, universal_newlines=True, stdout=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task Convert_Sam_Bam_SRR098401testsic_c845cfe3a6 has status FAILED
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 11 pending tasks possibly being run by other workers
DEBUG: There are 11 pending tasks unique to this worker
DEBUG: There are 11 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=235547989, workers=1, host=Curium, username=likith_reddy, pid=18146) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====

Scheduled 13 tasks of which:

    2 ran successfully:
        2 Bwa_Mem(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
    2 failed:
        2 Convert_Sam_Bam(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
    9 were left pending, among these:
        9 had failed dependencies:
            2 Call_Variant(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
            2 Convert_Bcf_Vcf(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
            1 Custom_Genome_Pipeline()
            2 Index_Bam(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)
            2 Sort_Bam(sample=SRR098401testsickle_1.fastq,SRR098401testsickle_2.fastq)

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

ADD REPLY
0
Entering edit mode

Read: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

Your question makes no sense as you provide neither context nor steps to reproduce what you tried. If you do not invest effort into making it easier for people that volunteer their time to help you, it will be immensely difficult for you to get help in a timely manner.

ADD REPLY
0
Entering edit mode

Hi, can you provide the Luigi original Code. thanks!

ADD REPLY
3
Entering edit mode
10.5 years ago
valentine ▴ 30

Also see Ratatosk, a pipeline framework for bioinformatics tasks built on Luigi: http://ratatosk.readthedocs.org/en/latest/index.html

ADD COMMENT
0
Entering edit mode

Great ! Didn't know about that, thanks for sharing

ADD REPLY
2
Entering edit mode
9.6 years ago
Samuel Lampa ★ 1.3k

For some bioinformatics tasks, the extra boilerplate needed with luigi can be a bit hampering. This lead me to write a little (~50 lines of code) helper library that lets you define inputs and outputs inline in the command pattern, and connect inputs and outputs between tasks using single-assignment syntax, similar to when you set variables.

So for anyone interested in checking it out, the library is available at github:

... and a somewhat realistic NGS bioinformatics example can be found in this gist:

And, it has a page here on BioStars too:

ADD COMMENT
0
Entering edit mode

Just for reference: Luigi's Monkey Wrench is nowadays deprecated in favor of SciLuigi

ADD REPLY
0
Entering edit mode
8.7 years ago
ostrokach ▴ 350

The best pipelining tool I have used is snakemake. Make can work OK as well (+qmake if you are using SGE).

For more info, see here: Workflow management software for pipeline development in NGS

ADD COMMENT

Login before adding your answer.

Traffic: 1650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6