Question

Snakemake flow variantion

2

Entering edit mode

4.0 years ago

effidotpy ▴ 20

Hi, I have been able to create my own linear workflows (A -> B -> C -> D) with Snakemake. However, now I would like to include some optional steps (X) that should be only executed if the user specifies it. Briefly, most of the times C will take as input the output from B, but sometimes I would need X to take as input the output from B, and then C to take the output from X.

enter image description here

After looking at the documentation I have not been able to figure out how to do this. I don't even know if this is feasible, or if there is other approach that fits better. I would appreciate some guidance here.

Thanks!

snakemake python automatization workflow • 3.0k views

ADD COMMENT • link updated 4.0 years ago by russhh 5.7k • written 4.0 years ago by effidotpy ▴ 20

1

Entering edit mode

4.0 years ago

gb ★ 2.2k

snakemake is basically python code. So you could wrap the rule in a if else like:

if sample_config["UseX"]:
    rule x:

Or maybe this can help: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution

ADD COMMENT • link 4.0 years ago by gb ★ 2.2k

0

Entering edit mode

Are you sure you can do that (optional rule definition) on a sample-by-sample basis?

ADD REPLY • link 4.0 years ago by russhh 5.7k

0

Entering edit mode

Yes

Here some other sources:

https://stackoverflow.com/questions/56179038/is-it-possible-to-enable-disable-certain-snakemake-rules-using-command-line-flag

https://groups.google.com/forum/#!topic/snakemake/qX7RfXDTDe4

And if you have problems with the rule all you can also add this method as well:

https://stackoverflow.com/questions/57090794/put-optional-input-files-for-rule-all-in-snakemake

EDIT:

I understand what you mean now, changed my answer.

ADD REPLY • link 4.0 years ago by gb ★ 2.2k

score 5 · Accepted Answer · 2020-04-16

The input to C can be a function that defines files based on the wildcard for the output from C. You could write a function that decides what the input to C should therefore be.

When you say that "sometimes I would need X to take as input the output from B and then C to take the output from X", do you mean that the optional use of X is decided for a given sample within your workflow (sample 1 might pass through X, but sample 2 might not need to), or that the whole workflow should optionally use X based on some config/argument (for a given experiment, choose to run all samples through X)

Something like this

# optionally use X for every sample passing through the workflow
def input_for_c(wildcards):
    # requires a config containing switches for the whole workflow
    if config["Use X"]:
        return "./data/X/{}".format(wildcards["sample_id"])
    else:
        return "./data/B/{}".format(wildcards["sample_id"])

# optionally use X for the current sample
def input_for_c(wildcards):
    # requires a `sample_config` containing switches for each separate sample
    sample_id = wildcards["sample_id"]
    if sample_config[sample_id]["Use X"]:
        return "./data/X/{}".format(wildcards.sample_id)
    else:
        return "./data/B/{}".format(wildcards.sample_id)

rule all:
    input: expand("./data/C/{sample_id}", sample_id = SAMPLES)

rule B:
    output: "./data/B/{sample_id}"
    ...

rule X:
    output: "./data/X/{sample_id}"
    ...

rule C:
    input: input_for_c
    output: "./data/C/{sample_id}"
    ...