Question

Difference Between "Pipeline" And "Workflow" ?

8

Entering edit mode

12.2 years ago

Pascal ★ 1.5k

A quick and basic question today. I often see in literature (in particular in the context of NGS) the words "pipeline" and "workflow" used alternatively. Is there a real difference between those?

next-gen sequencing • 47k views

ADD COMMENT • link written 12.2 years ago by Pascal ★ 1.5k

3

Entering edit mode

1st world problems :D

ADD REPLY • link 12.2 years ago by Fabian Bull ★ 1.3k

1

Entering edit mode

Looks like the consensus will be: no consensus

ADD REPLY • link 12.2 years ago by Alastair Kerr 5.3k

score 8 · Answer 1 · 2016-07-25

From IT and C/S usage:

Pipeline

A pipeline is a series of processes, usually linear, which filter or transform data. The processes are generally assumed to be running concurrently. The data flow diagram of a pipeline does not normally branch or loop. The first process takes raw data as input, does something to it, then sends its results to the second process, and so on, eventually ending with the final result being produced by the last process in the pipeline. Pipelines are normally quick, with a flow taking seconds to hours for end-to-end processing of a single set of data.

Examples of pipelines in the real world include chaining two or more processes together on the command line using the '|' (pipe) symbol, with results in stdout or redirected to a file, or a simple software build process driven by 'make'.

Workflow

A workflow is a set of processes, usually non-linear, often human rather than machine, which filter or transform data, often triggering external events. The processes are not assumed to be running concurrently. The data flow diagram of a pipeline can branch or loop. There may be no clearly defined "first" process -- data may enter the workflow from multiple sources. Any process may take raw data as input, do something to it, then send its results to another process. There may be no single "final result" from a single process; rather, multiple processes might deliver results to multiple recipients. Workflows can be complex and long-lived; a single flow may take days, months, or even years to execute.

Examples of workflows in the real world include document, bug, or order processing, or iterative processing of very large data sets, particularly if humans are in the loop.

Mixing of terms

These terms have become mixed in recent years, in part because pipelines can be implemented as a very simple subset of workflows. In previous decades, workflow software was large, complex, commercial, and involved high licensing fees, while pipelines were a thing you did on the fly or in a shell script. The terminology has become more blurred as simpler "workflow" software packages have emerged; some of these are really just complicated versions of distributed 'make', and don't support humans in the loop. They really should have been called "data flow" rather than workflow packages. Likewise, there have been more efforts to support branching, looping, and suspended flows in "pipeline" libraries for various languages, and we've seen more pipelines spread over multiple machines, with data transport via HTTP, other TCP protocols, or shared networked filesystems.

score 6 · Answer 2 · 2012-02-23

6

Entering edit mode

12.2 years ago

Jeremy Leipzig 22k

A pipeline could just be a bunch of commands embedded in a build script.

When I hear workflow I think exclusively of a heavyweight platform like Taverna that is designed to make it easy for end users to use modular units to construct analyses. Of course, Pipeline Pilot also falls into this category, so it appears I might be the only one who makes this assumption.

http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems

ADD COMMENT • link 12.2 years ago by Jeremy Leipzig 22k

0

Entering edit mode

One could also say that a workflow is a high-level concept that can include manual or even wet lab operations

ADD REPLY • link 3.8 years ago by Jeremy Leipzig 22k

score 5 · Answer 3 · 2012-02-23

I would tend to think there is little difference but I do use these terms in slightly different ways.

I use 'pipeline' to refer to an established (often large) workflow (e.g. the Ensembl pipeline) that may have flow control built-in.

I use the term 'workflow' as a series of computational steps, usually programmed to run at once but sometimes just their conception notion is enough to refer to it as such.

score 1 · Answer 4 · 2012-02-23

I suspect that in practice there's not a lot to it, and the difference in usage maybe to do with the background of the speaker. For example, in my usage a workflow is a more formal, strict and computational term than pipeline. If I had to justify that, certain (non-bioinformatic) software systems have workflows meaning that documents and data move automatically from stage to stage, which is not far from Galaxy's series of analysis steps. But they're foggy terms.