creating container - should it contain a workflow manager?
Entering edit mode
11 months ago
Richard ▴ 580

Hi folks.

I'm constructing a simple container that will accept a pair of fastq files and run the following tools:

  1. cutadapt
  2. minimap2
  3. Strelka2
  4. SNPEff
  5. BASH / SnpSift to count up some of the SNPEff results.

The container is specifically required to operate on a single pair of fastqs at a time. In other words, if someone wants to run more than one sample they can run two instances of the container.

My initial plan was just to provide the BASH commands for executing the series of tools into the container recipe so that when someone runs the container they are just running the BASH commands that I wrote inside the container environment.

What I'm wrapping my head around now is if it is advisable/recommeded/preferable to to have the container use something like snakemake or nextflow to manage the simple linear workflow I have listed above. There is always some chance that more analyses will be added to the container.

Are there any thoughts about if my container should employ a workflow manager on the inside, or is it better to keep things simple with plain BASH commands?

So far all I can really say is that managing the stderr/stdout messages from each of the tools is making my list of BASH commands look ugly with lots of capture and redirection. Can't say that is much of a problem though. Am I missing something?

thanks Richard

singularity snakemake nextflow • 292 views
Entering edit mode

if it is advisable/recommended/preferable to to have the container use something like snakemake or nextflow

IMO yes it is, 100%, no question, definitely. The time/effort to learn one of those will be equal to writing your own implementation, only yours will not be as reproducible, and many of the issues you have (or haven't yet encountered/thought about) have been well solved already.

Entering edit mode
11 months ago
JC 12k

Both have their advantages/disadvantages:

  • The use of a workflow manager is clean and easy to understand the process, and it is better for complex pipelines, however you will need to add that to the container (if size is not a problem)
  • Bash is more portable, can be done quickly but it is better for small pipelines
Entering edit mode
11 months ago

These (Docker/Bash/Pipeline frameworks) are compatible, not mutually exclusive, technologies.

Pipeline frameworks offer:

  • Parameterization
  • File/task abstraction and dependency graphs
  • Reentrancy

You can definitely re-invent the wheel but, as you have noted, it gets ugly real fast


Login before adding your answer.

Traffic: 2066 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6