Snakemake restricted wildcard combinations
1
2
Entering edit mode
7 weeks ago
Meryl ▴ 20

Hi, I'm new to snakemake and haven't seen something like this in tutorials I've tried. I have a data frame with unique keys, chromosomes, start positions, and end positions. I essentially want to loop over every group to do an operation. How would I assign the wildcards such that only the combinations I want to run get run? So for example for the data frame:

key chr start end

a 1 50 51

b 1 50 51

c 2 23 25

d 2 30 30

e 3 10 12

I would want the following rule run 5 times, once for each row in the data frame.

rule run_plink:
  input:
      multiext("all_{chr}", ".bed", ".bim", ".fam")
  output:
      multiext("my_{key}", ".bed", ".bim", ".fam")
  params:
      all="all_{chr}",
      out="my_{key}"
  conda: "my_env.yaml"
  shell: "plink --bfile {params.all} --chr {wildcard.chr} --from-bp {wildcard.start} --to-bp {wildcard.end} --make-bed --out {params.key}"

Any ideas? My concern for defining the wildcards beforehand is that snakemake will not group them together (i.e. instead of 5 runs, it will be 534*4=240 runs for this example set).

wildcards snakemake • 219 views
ADD COMMENT
1
Entering edit mode
7 weeks ago

the simplest way is to name your outputs by the tuples in your rows output: "{key}_{chr}_{start}_{end}.out"

then you just need to assign your filenames to an array using vanilla python and set a target rule to that array

ADD COMMENT

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6