Question: Difference Between "Pipeline" And "Workflow" ?
8
gravatar for Pascal
8.4 years ago by
Pascal1.5k
Barcelona
Pascal1.5k wrote:

A quick and basic question today. I often see in literature (in particular in the context of NGS) the words "pipeline" and "workflow" used alternatively. Is there a real difference between those?

next-gen sequencing • 22k views
ADD COMMENTlink modified 4.0 years ago by JD3e33440 • written 8.4 years ago by Pascal1.5k
3

1st world problems :D

ADD REPLYlink written 8.4 years ago by Fabian Bull1.3k
1

Looks like the consensus will be: no consensus

ADD REPLYlink written 8.4 years ago by Alastair Kerr5.2k
6
gravatar for Jeremy Leipzig
8.4 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

A pipeline could just be a bunch of commands embedded in a build script.

When I hear workflow I think exclusively of a heavyweight platform like Taverna that is designed to make it easy for end users to use modular units to construct analyses. Of course, Pipeline Pilot also falls into this category, so it appears I might be the only one who makes this assumption.

http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems

ADD COMMENTlink written 8.4 years ago by Jeremy Leipzig19k

One could also say that a workflow is a high-level concept that can include manual or even wet lab operations

ADD REPLYlink written 12 days ago by Jeremy Leipzig19k
6
gravatar for stevegt
4.0 years ago by
stevegt60
stevegt60 wrote:

From IT and C/S usage:

Pipeline

A pipeline is a series of processes, usually linear, which filter or transform data. The processes are generally assumed to be running concurrently. The data flow diagram of a pipeline does not normally branch or loop. The first process takes raw data as input, does something to it, then sends its results to the second process, and so on, eventually ending with the final result being produced by the last process in the pipeline. Pipelines are normally quick, with a flow taking seconds to hours for end-to-end processing of a single set of data.

Examples of pipelines in the real world include chaining two or more processes together on the command line using the '|' (pipe) symbol, with results in stdout or redirected to a file, or a simple software build process driven by 'make'.

Workflow

A workflow is a set of processes, usually non-linear, often human rather than machine, which filter or transform data, often triggering external events. The processes are not assumed to be running concurrently. The data flow diagram of a pipeline can branch or loop. There may be no clearly defined "first" process -- data may enter the workflow from multiple sources. Any process may take raw data as input, do something to it, then send its results to another process. There may be no single "final result" from a single process; rather, multiple processes might deliver results to multiple recipients. Workflows can be complex and long-lived; a single flow may take days, months, or even years to execute.

Examples of workflows in the real world include document, bug, or order processing, or iterative processing of very large data sets, particularly if humans are in the loop.

Mixing of terms

These terms have become mixed in recent years, in part because pipelines can be implemented as a very simple subset of workflows. In previous decades, workflow software was large, complex, commercial, and involved high licensing fees, while pipelines were a thing you did on the fly or in a shell script. The terminology has become more blurred as simpler "workflow" software packages have emerged; some of these are really just complicated versions of distributed 'make', and don't support humans in the loop. They really should have been called "data flow" rather than workflow packages. Likewise, there have been more efforts to support branching, looping, and suspended flows in "pipeline" libraries for various languages, and we've seen more pipelines spread over multiple machines, with data transport via HTTP, other TCP protocols, or shared networked filesystems.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by stevegt60

I believe there is a small typo in the second line of the workflow paragraph. It should state: The data flow diagram of a workflow can branch or loop. Thank you

ADD REPLYlink written 12 months ago by eraneues0
5
gravatar for Alastair Kerr
8.4 years ago by
Alastair Kerr5.2k
Manchester/UK/Cancer Biomarker Centre at CRUK-MI
Alastair Kerr5.2k wrote:

I would tend to think there is little difference but I do use these terms in slightly different ways.

I use 'pipeline' to refer to an established (often large) workflow (e.g. the Ensembl pipeline) that may have flow control built-in.

I use the term 'workflow' as a series of computational steps, usually programmed to run at once but sometimes just their conception notion is enough to refer to it as such.

ADD COMMENTlink modified 8.4 years ago • written 8.4 years ago by Alastair Kerr5.2k
1
gravatar for User 1686
8.4 years ago by
User 168610
User 168610 wrote:

I suspect that in practice there's not a lot to it, and the difference in usage maybe to do with the background of the speaker. For example, in my usage a workflow is a more formal, strict and computational term than pipeline. If I had to justify that, certain (non-bioinformatic) software systems have workflows meaning that documents and data move automatically from stage to stage, which is not far from Galaxy's series of analysis steps. But they're foggy terms.

ADD COMMENTlink written 8.4 years ago by User 168610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1687 users visited in the last hour