Question: creating container - should it contain a workflow manager?
gravatar for Richard
5 days ago by
Richard570 wrote:

Hi folks.

I'm constructing a simple container that will accept a pair of fastq files and run the following tools:

  1. cutadapt
  2. minimap2
  3. Strelka2
  4. SNPEff
  5. BASH / SnpSift to count up some of the SNPEff results.

The container is specifically required to operate on a single pair of fastqs at a time. In other words, if someone wants to run more than one sample they can run two instances of the container.

My initial plan was just to provide the BASH commands for executing the series of tools into the container recipe so that when someone runs the container they are just running the BASH commands that I wrote inside the container environment.

What I'm wrapping my head around now is if it is advisable/recommeded/preferable to to have the container use something like snakemake or nextflow to manage the simple linear workflow I have listed above. There is always some chance that more analyses will be added to the container.

Are there any thoughts about if my container should employ a workflow manager on the inside, or is it better to keep things simple with plain BASH commands?

So far all I can really say is that managing the stderr/stdout messages from each of the tools is making my list of BASH commands look ugly with lots of capture and redirection. Can't say that is much of a problem though. Am I missing something?

thanks Richard

ADD COMMENTlink modified 5 days ago by Jeremy Leipzig19k • written 5 days ago by Richard570

if it is advisable/recommended/preferable to to have the container use something like snakemake or nextflow

IMO yes it is, 100%, no question, definitely. The time/effort to learn one of those will be equal to writing your own implementation, only yours will not be as reproducible, and many of the issues you have (or haven't yet encountered/thought about) have been well solved already.

ADD REPLYlink modified 4 days ago • written 4 days ago by bruce.moran790
gravatar for JC
5 days ago by
JC10k wrote:

Both have their advantages/disadvantages:

  • The use of a workflow manager is clean and easy to understand the process, and it is better for complex pipelines, however you will need to add that to the container (if size is not a problem)
  • Bash is more portable, can be done quickly but it is better for small pipelines
ADD COMMENTlink written 5 days ago by JC10k
gravatar for Jeremy Leipzig
5 days ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

These (Docker/Bash/Pipeline frameworks) are compatible, not mutually exclusive, technologies.

Pipeline frameworks offer:

  • Parameterization
  • File/task abstraction and dependency graphs
  • Reentrancy

You can definitely re-invent the wheel but, as you have noted, it gets ugly real fast

ADD COMMENTlink modified 5 days ago • written 5 days ago by Jeremy Leipzig19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour