Question

creating container - should it contain a workflow manager?

0

Entering edit mode

3.9 years ago

Richard ▴ 590

Hi folks.

I'm constructing a simple container that will accept a pair of fastq files and run the following tools:

cutadapt
minimap2
Strelka2
SNPEff
BASH / SnpSift to count up some of the SNPEff results.

The container is specifically required to operate on a single pair of fastqs at a time. In other words, if someone wants to run more than one sample they can run two instances of the container.

My initial plan was just to provide the BASH commands for executing the series of tools into the container recipe so that when someone runs the container they are just running the BASH commands that I wrote inside the container environment.

What I'm wrapping my head around now is if it is advisable/recommeded/preferable to to have the container use something like snakemake or nextflow to manage the simple linear workflow I have listed above. There is always some chance that more analyses will be added to the container.

Are there any thoughts about if my container should employ a workflow manager on the inside, or is it better to keep things simple with plain BASH commands?

So far all I can really say is that managing the stderr/stdout messages from each of the tools is making my list of BASH commands look ugly with lots of capture and redirection. Can't say that is much of a problem though. Am I missing something?

thanks Richard

singularity snakemake nextflow • 922 views

ADD COMMENT • link updated 3.9 years ago by Jeremy Leipzig 22k • written 3.9 years ago by Richard ▴ 590

0

Entering edit mode

if it is advisable/recommended/preferable to to have the container use something like snakemake or nextflow

IMO yes it is, 100%, no question, definitely. The time/effort to learn one of those will be equal to writing your own implementation, only yours will not be as reproducible, and many of the issues you have (or haven't yet encountered/thought about) have been well solved already.

ADD REPLY • link 3.9 years ago by bruce.moran ▴ 960

score 0 · Answer 1 · 2020-05-22

Both have their advantages/disadvantages:

The use of a workflow manager is clean and easy to understand the process, and it is better for complex pipelines, however you will need to add that to the container (if size is not a problem)
Bash is more portable, can be done quickly but it is better for small pipelines

score 0 · Answer 2 · 2020-05-22

0

Entering edit mode

3.9 years ago

Jeremy Leipzig 22k

These (Docker/Bash/Pipeline frameworks) are compatible, not mutually exclusive, technologies.

Pipeline frameworks offer:

Parameterization
File/task abstraction and dependency graphs
Reentrancy

You can definitely re-invent the wheel but, as you have noted, it gets ugly real fast

ADD COMMENT • link 3.9 years ago by Jeremy Leipzig 22k