Forum: Snakemake vs. Nextflow: strengths and weaknesses
17
gravatar for ropolocan
14 months ago by
ropolocan430
Canada
ropolocan430 wrote:

I have seen increasing interest in workflow/pipeline management systems such as snakemake and nextflow. In my opinion, both seem very interesting and very promising. There is a very interesting review from 2016 in which bash, make, snakemake and nextflow were compared: https://www.jmazz.me/blog/NGS-Workflows

The author of that review did a very good job of analyzing the strengths and weaknesses of snakemake and nextflow. I am not sure how much has changed since then, but in your experience, what would be some criteria that bioinformaticians could consider to choose one over the other? Have some of the identified weaknesses of both snakemake and nextflow have been addressed since then?

nextflow snakemake forum • 5.7k views
ADD COMMENTlink modified 6 months ago • written 14 months ago by ropolocan430
2

I started using snakemake 6 months ago, and now I have shifted all my pipelines to snakemake (ChIP-seq, RNA-seq, ATAC-seq and DNA-seq). I am pretty happy with it. once you get the idea of how snakemake works (think in a bottom-up fashion), it is easy to build up your own pipelines. BTW, the documentation is awesome.

you can write a customer script for submitting jobs to the cluster for each platform (LSF, moab...) if you want more control of your jobs. e.g. https://bitbucket.org/snakemake/snakemake/issues/28/clustering-jobs-with-snakemake

only downside for me is that when I have more than 1000 jobs to submit, it takes time for snakemake to process the metadata associated with each job. For a dry-run, it takes minutes. I do not know how fast nextflow is.

ADD REPLYlink written 14 months ago by Ming Tang2.3k
1

And BioMake is off the game? It uses prolog (which is both the weakness and the strength...).

ADD REPLYlink written 14 months ago by kamiljaron100

Hello, @kamiljaron. I was not aware of BioMake; I would have to read up on it. I do not know prolog, nor have I ever used a logical programming language, but I will read more about what BioMake has to offer.

ADD REPLYlink written 14 months ago by ropolocan430
1

No knowledge of prolog required! You can use gnu make syntax to specify your workflow

ADD REPLYlink written 4 months ago by cmungall20
8
gravatar for dariober
14 months ago by
dariober9.3k
Glasgow - UK
dariober9.3k wrote:

Besides what a tool can or cannot do I like to check the quality of the documentation, whether it is actively developed and maintained, how many developers contribute to it, and size of the user base.

It seems to me that snakemake and nextflow are pretty much on a draw for all these metrics and both are pretty good (although in terms of user base and developers they are far from tools like luigi). So I think it's a difficult choice between these two...

I haven't tried nextflow, but recently I started working with snakemake and I'm very happy with it. Actually I feel dumb that for years I've been hacking together bash scripts to run pipelines. For me one advantage of snakemake is that a snakemake script is effectively python with additional features on top. So if you know python, putting some complex logic and functions in a snakemake script is straightforward. I guess the same applies to nexflow but using groovy, which is not so popular though.

From the review you link it seems nextflow doesn't have a "dry run" option. I find dryrun to be super useful to see what would be executed and for developing and debugging is great.

Just my 2p...

ADD COMMENTlink written 14 months ago by dariober9.3k
1

Thank you very much for your answer, @dariober.

It seems to me that snakemake and nextflow are pretty much on a draw for all these metrics and both are pretty good (although in terms of user base and developers they are far from tools like luigi). So I think it's a difficult choice between these two...

I am curious about luigi. I have read many good comments about it, and I will be looking into testing it as well. I was testing Snakemake and I can see why it has garnered attention.

Actually I feel dumb that for years I've been hacking together bash scripts to run pipelines. For me one advantage of snakemake is that a snakemake script is effectively python with additional features on top. So if you know python, putting some complex logic and functions in a snakemake script is straightforward. I guess the same applies to nexflow but using groovy, which is not so popular though.

Using snakemake was kind of an "eureka" moment for me as well. It has so much potential, and I look forward to adapt other pipelines I had written on bash or python to snakemake.

ADD REPLYlink written 14 months ago by ropolocan430
1

About the dry run option, if I am not wrong, nextflow does not have it because it does not know a priori what will be the exact execution dag. Nextflow language is more expressive and the execution dag may depends on the input data if you have conditional executions in your workflow for example (which is not possible in Snakemake I think?)

ADD REPLYlink written 14 months ago by Fred660

Snakemake allows for conditional creation of the DAG and conditional execution of different code based on the input.

ADD REPLYlink written 6 months ago by endrebak650

For me one advantage of snakemake is that a snakemake script is effectively python with additional features on top.

I would call this a disadvantage. The Python ecosystem is a mess to work with when it comes to 3rd party libraries. I tried to install it for myself on our HPC and immediately hit a million issues with environment management, not all of which are solveable with virtualenv's or conda. On the other hand, Nextflow installs seamlessly on any system that has Java 8, including our HPC. Re-learning the few extra programming bits I needed in Groovy was a very small price to pay in order to have Nextflow's greater ease of portability & execution.

ADD REPLYlink written 3 months ago by steve1.6k
1

Funny, I find java generally more annoying to deal with. To each their own I guess.

ADD REPLYlink written 3 months ago by Devon Ryan82k

I've never actually had to deal with Java to get Nextflow to work, beyond making sure it was installed and using Java 8. Installing Nextflow has been a one-liner on every system I've tried. On the other hand, every Python based workflow management system I have tried (along with most other Python packages) have required a lot of hands-on environment configuration and management, which is not only a pain in the butt but also greatly impairs the feasibility of popping up a pipeline instance on new systems on an ad-hoc basis.

ADD REPLYlink modified 3 months ago • written 3 months ago by steve1.6k
5
gravatar for Sinji
14 months ago by
Sinji2.7k
UT Southwestern Medical Center
Sinji2.7k wrote:

I'm a big fan of Nextflow. I've used Snakemake in the past, and it was originally my go-to workflow language, but the built in support for Docker, Singularity, and HPC environments that Nextflow provides just can't be beat.

The only downside is you have to use Groovy.

ADD COMMENTlink modified 14 months ago • written 14 months ago by Sinji2.7k

Thank you very much for your answer, @Sinji. I also look forward to test Nextflow. Both workflow systems/languages have so much potential. I think they could make a very important impact on bionformatics.

ADD REPLYlink written 14 months ago by ropolocan430

Snakemake has singularity support with the singularity directive. I haven't used nextflow, but I would be amazed if it is as flexible as Snakemake. (Note to past self: there is a flexibility vs rigor tradeoff.)

ADD REPLYlink modified 3 months ago • written 6 months ago by endrebak650
2

Get prepared to be amazed.

ADD REPLYlink written 6 months ago by paolo.ditommaso150
3
gravatar for shenwei356
14 months ago by
shenwei3564.1k
China
shenwei3564.1k wrote:

Table 1: Comparison of Nextflow with other workflow management systems

Workflow Nextflow Galaxy Toil Snakemake Bpipe
Platforma Groovy/JVM Python Python Python Groovy/JVM
Native task supportb Yes (any) No No Yes (BASH only) Yes (BASH only)
Common workflow languagec No Yes Yes No No
Streaming processingd Yes No No No No
Dynamic branch evaluation Yes ? Yes Yes Undocumented
Code sharing integratione Yes No No No No
Workflow modulesf No Yes Yes Yes Yes
Workflow versioningg Yes Yes No No No
Automatic error failoverh Yes No Yes No No
Graphical user interfacei No Yes No No No
DAG renderingj Yes Yes Yes Yes Yes
Container management
Docker supportk Yes Yes Yes No No
Singularity supportl Yes No No No No
Multi-scale containersm Yes Yes Yes No No
Built-in batch schedulersn
Univa Grid Engine Yes Yes Yes Partial Yes
PBS/Torque Yes Yes No Partial Yes
LSF Yes Yes No Partial Yes
SLURM Yes Yes Yes Partial No
HTCondor Yes Yes No Partial No
Built-in distributed clustero
Apache Ignite Yes No No No No
Apache Spark No No Yes No No
Kubernetes Yes No No No No
Apache Mesos No No Yes No No
Built-in cloudp
AWS (Amazon Web Services) Yes Yes Yes No No

ADD COMMENTlink modified 14 months ago • written 14 months ago by shenwei3564.1k
3

To be fair it would be nice to see the same table compiled or commented by the authors of snakemake... With respect to slurm, I don't know what is meant by "partial" support in snakemake. I started playing with snakemake and running jobs using slurm is incredibly simple.

ADD REPLYlink written 14 months ago by dariober9.3k
3

Yeah, snakemake has full support for anything that uses drmaa, which I expect is also what Galaxy uses and probably what nextflow uses. Further, the footnote in the table for that section basically amounts to, "Actually, it has full support for these and any future schedulers, you just have to tell it how to execute the commands." I prefer the snakemake way of doing this, since everyone submits jobs through a wrapper I wrote and that way lots of things (temp space, memory usage, queue, etc.) can be conveniently set without including them again and again in snakemake files.

ADD REPLYlink modified 14 months ago • written 14 months ago by Devon Ryan82k

Nextflow does not use DRMAA. It uses the scheduler's native directives. Here is an example from the source code. Also note that cluster options such as memory and CPUs can all be set for pipeline processes independently of the actual pipeline script, and you can use profiles to have multiple sets of configurations for different systems (e.g. one pipeline script, and different execution configs for HPC, local, AWS, etc.). Docs here

ADD REPLYlink modified 3 months ago • written 3 months ago by steve1.6k
2

All of that largely applies to snakemake too :)

ADD REPLYlink written 3 months ago by Devon Ryan82k
1

Thanks for sharing this table, @shenwei356! It is very interesting to see that nextflow has stream processing, workflow versioning, and full support for SLURM, in addition to having native task support for any language. I believe snakemake has native task support for R now as well. Thanks again for your answer.

ADD REPLYlink written 14 months ago by ropolocan430
1

I think this table is little outdated in relation to bpipe - which I use daily. SLURM support exists (at least we are using it in-house), and I am pretty sure that stages in R can be run natively without wrapping in an Rscript.

ADD REPLYlink written 8 months ago by A. Domingues1.6k
1

The table is outdated by now, Snakemake does support Kubernetes AFAICT: https://snakemake.readthedocs.io/en/stable/executable.html#executing-a-snakemake-workflow-via-kubernetes

ADD REPLYlink written 3 months ago by Roman Valls GuimerĂ 500
1

The table was never particularly accurate to begin with.

ADD REPLYlink written 3 months ago by Devon Ryan82k
3
gravatar for ropolocan
8 months ago by
ropolocan430
Canada
ropolocan430 wrote:

I am revisiting this post to mention that snakemake supports automated deployment of software dependencies with conda as well as the specification of conda environments per rule. This is very exciting!

ADD COMMENTlink modified 8 months ago • written 8 months ago by ropolocan430
1

This feature has also been added to Nextflow. Link

ADD REPLYlink written 3 months ago by steve1.6k

Excellent! Thanks for bringing attention to this, @steve. I will definitely try it out.

ADD REPLYlink written 3 months ago by ropolocan430
2
gravatar for ropolocan
14 months ago by
ropolocan430
Canada
ropolocan430 wrote:

I just read this excellent review by @Jeremy Leipzig. This article can be helpful for deciding which workflow management system is more suitable to each one's needs: https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbw020

ADD COMMENTlink modified 14 months ago • written 14 months ago by ropolocan430
0
gravatar for ropolocan
7 months ago by
ropolocan430
Canada
ropolocan430 wrote:

I thought I would share this Reddit thread on workflow management systems. There are very interesting posts on snakemake vs. nextflow: https://www.reddit.com/r/bioinformatics/comments/73am0k/ncbi_hackathons_discussions_on_bioinformatics/

ADD COMMENTlink modified 7 months ago • written 7 months ago by ropolocan430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 695 users visited in the last hour