Snakemake wildcard usage
1
2
Entering edit mode
6 days ago

Hi, I have a set of files that I’d like to perform a function on, with the goal of applying one or more parameters in that function that include more than one possible state.

For example, I might have two samples, each with their own fasta file: sample_A and sample_B.

I want to perform a blast search for each input fasta file, but I also want to loop through a range of word sizes for every blast process for each sample. Say, three values: 11, 13, 15.

This would mean that for the sample_*.fasta input, I’d generate three blast output files, each one reflecting one of those three word size values.

I am struggling to understand how to structure the snakemake rule for input and output names, because my they don’t share the same wildcards - there is an extra name from the blast parameter output that isn’t part of the input name.

Thanks for advice on how to include a parameter name in a snakemake rule into the output name!

Snakemake • 191 views
2
Entering edit mode
6 days ago

Maybe this?

samples = ['sample_a', 'sample_b', 'sample_c']
word_sizes = [11, 13, 15]

rule all:
input:
expand('blast/{sample}.{word_size}.out', sample= samples, word_size= word_sizes),

rule blast:
input:
fa= '{sample}.fa',
output:
out= 'blast/{sample}.{word_size}.out',
shell:
r"""
blastn -word_size {wildcards.word_size} -query {input.fa} -out {output.out} ...
"""


Note that the expand() function will create all combinations of sample and word_size and returns a list of strings. If you want more control on what combinations to have you can use any python code to create such list.

Also, this assumes fasta file are named with the sample prefix, if this is not the case you can use a dictionary to map samples to fasta files. In this latter case you may need to use a function as input to the blast rule.

1
Entering edit mode

Thank you very much @dariober - this is a huge help. I'm curious what role the fasta = ['sample_a.fa'... variable (on the second line) plays in your explanation, as I don't see where it is incorporated into the Snakemake workflow. Appreciate the clarification!

1
Entering edit mode

Indeed, the fasta variable is redundant in this example - I'm going to edit my answer to remove it.

0
Entering edit mode

Thanks very much. I think the key piece I've been failing to understand was how to include a global variable to the rule all and rule {something_else} arguments. You've shown that I need to use the {wildcards...} object in the rule {something_else}, which was something I noticed in this snakemake documentation but never put together. Cheers!