Hi, I'm new to snakemake and haven't seen something like this in tutorials I've tried. I have a data frame with unique keys, chromosomes, start positions, and end positions. I essentially want to loop over every group to do an operation. How would I assign the wildcards such that only the combinations I want to run get run? So for example for the data frame:
key chr start end
a 1 50 51
b 1 50 51
c 2 23 25
d 2 30 30
e 3 10 12
I would want the following rule run 5 times, once for each row in the data frame.
rule run_plink:
input:
multiext("all_{chr}", ".bed", ".bim", ".fam")
output:
multiext("my_{key}", ".bed", ".bim", ".fam")
params:
all="all_{chr}",
out="my_{key}"
conda: "my_env.yaml"
shell: "plink --bfile {params.all} --chr {wildcard.chr} --from-bp {wildcard.start} --to-bp {wildcard.end} --make-bed --out {params.key}"
Any ideas? My concern for defining the wildcards beforehand is that snakemake will not group them together (i.e. instead of 5 runs, it will be 534*4=240 runs for this example set).