Snakemake flow variantion
2.9 years ago
effidotpy ▴ 20

Hi, I have been able to create my own linear workflows (A -> B -> C -> D) with Snakemake. However, now I would like to include some optional steps (X) that should be only executed if the user specifies it. Briefly, most of the times C will take as input the output from B, but sometimes I would need X to take as input the output from B, and then C to take the output from X.

After looking at the documentation I have not been able to figure out how to do this. I don't even know if this is feasible, or if there is other approach that fits better. I would appreciate some guidance here.

Thanks!

2.9 years ago
russhh 5.7k

The input to C can be a function that defines files based on the wildcard for the output from C. You could write a function that decides what the input to C should therefore be.

When you say that "sometimes I would need X to take as input the output from B and then C to take the output from X", do you mean that the optional use of X is decided for a given sample within your workflow (sample 1 might pass through X, but sample 2 might not need to), or that the whole workflow should optionally use X based on some config/argument (for a given experiment, choose to run all samples through X)

Something like this

# optionally use X for every sample passing through the workflow
def input_for_c(wildcards):
# requires a config containing switches for the whole workflow
if config["Use X"]:
return "./data/X/{}".format(wildcards["sample_id"])
else:
return "./data/B/{}".format(wildcards["sample_id"])

# optionally use X for the current sample
def input_for_c(wildcards):
# requires a sample_config containing switches for each separate sample
sample_id = wildcards["sample_id"]
if sample_config[sample_id]["Use X"]:
return "./data/X/{}".format(wildcards.sample_id)
else:
return "./data/B/{}".format(wildcards.sample_id)

rule all:
input: expand("./data/C/{sample_id}", sample_id = SAMPLES)

rule B:
output: "./data/B/{sample_id}"
...

rule X:
output: "./data/X/{sample_id}"
...

rule C:
input: input_for_c
output: "./data/C/{sample_id}"
...

This looks a bit complicated, on top of my head I think you can also do something like:

rule C:
input:
lambda wildcards: "./data/X/"+wildcards.sample_id if config["Use X"] else "./data/B/"+wildcards.sample_id
output:
"./data/C/{sample_id}"
...

You'd only include one of the functions in your actual Snakefile. Better to have a function than to stuff the same logic into a lambda IMO

Your help did the trick. Thanks mates!

Regarding your question russhh, I want to use this "switch" for the whole experiment, i.e to process all its samples equally. The point is that I also want to use this workflow for another experiments that might require some extra intermediary steps.

2.9 years ago
gb ★ 2.2k

snakemake is basically python code. So you could wrap the rule in a if else like:

if sample_config["UseX"]:
rule x:

Are you sure you can do that (optional rule definition) on a sample-by-sample basis?

