Question

Snakemake restricted wildcard combinations

2

Entering edit mode

2.4 years ago

Meryl ▴ 20

Hi, I'm new to snakemake and haven't seen something like this in tutorials I've tried. I have a data frame with unique keys, chromosomes, start positions, and end positions. I essentially want to loop over every group to do an operation. How would I assign the wildcards such that only the combinations I want to run get run? So for example for the data frame:

key chr start end

a 1 50 51

b 1 50 51

c 2 23 25

d 2 30 30

e 3 10 12

I would want the following rule run 5 times, once for each row in the data frame.

rule run_plink:
  input:
      multiext("all_{chr}", ".bed", ".bim", ".fam")
  output:
      multiext("my_{key}", ".bed", ".bim", ".fam")
  params:
      all="all_{chr}",
      out="my_{key}"
  conda: "my_env.yaml"
  shell: "plink --bfile {params.all} --chr {wildcard.chr} --from-bp {wildcard.start} --to-bp {wildcard.end} --make-bed --out {params.key}"

Any ideas? My concern for defining the wildcards beforehand is that snakemake will not group them together (i.e. instead of 5 runs, it will be 534*4=240 runs for this example set).

wildcards snakemake • 896 views

ADD COMMENT • link updated 2.4 years ago by Jeremy Leipzig 22k • written 2.4 years ago by Meryl ▴ 20

score 1 · Answer 1 · 2021-12-01

1

Entering edit mode

2.4 years ago

Jeremy Leipzig 22k

the simplest way is to name your outputs by the tuples in your rows output: "{key}_{chr}_{start}_{end}.out"

then you just need to assign your filenames to an array using vanilla python and set a target rule to that array

ADD COMMENT • link 2.4 years ago by Jeremy Leipzig 22k