Snakemake restricted wildcard combinations
1
2
Entering edit mode
7 weeks ago
Meryl ▴ 20

Hi, I'm new to snakemake and haven't seen something like this in tutorials I've tried. I have a data frame with unique keys, chromosomes, start positions, and end positions. I essentially want to loop over every group to do an operation. How would I assign the wildcards such that only the combinations I want to run get run? So for example for the data frame:

key chr start end

a 1 50 51

b 1 50 51

c 2 23 25

d 2 30 30

e 3 10 12

I would want the following rule run 5 times, once for each row in the data frame.

rule run_plink:
input:
multiext("all_{chr}", ".bed", ".bim", ".fam")
output:
multiext("my_{key}", ".bed", ".bim", ".fam")
params:
all="all_{chr}",
out="my_{key}"
conda: "my_env.yaml"
shell: "plink --bfile {params.all} --chr {wildcard.chr} --from-bp {wildcard.start} --to-bp {wildcard.end} --make-bed --out {params.key}"


Any ideas? My concern for defining the wildcards beforehand is that snakemake will not group them together (i.e. instead of 5 runs, it will be 534*4=240 runs for this example set).

wildcards snakemake • 219 views
1
Entering edit mode
7 weeks ago

the simplest way is to name your outputs by the tuples in your rows output: "{key}_{chr}_{start}_{end}.out"

then you just need to assign your filenames to an array using vanilla python and set a target rule to that array