I'm including a snipped of my code.
The following block generates two groups, I'm setting the the number of samples to 1 for easier debugging. The samples in groupOne would be [FOO_A] and samples in groupTwo would be [BAR_A]
num_of_samples = 1
groupOne = ["FOO_" + chr(alpha+97).capitalize() for alpha in range(0,num_of_samples)]
groupTwo = ["BAR_" + chr(alpha+97).capitalize() for alpha in range(0,num_of_samples)]
Below is my first rule (rule all) and two rules generating my reads, and two rules generating bowtie2 alignment (the index is prebuilt and not included here)
rule all:
input:
expand("aligned/ref_{chrom}.{gTwoSam}.bam", chrom=chromosomeNames, gTwoSam = groupTwo),
expand("aligned/ref_{chrom}.{gOneSam}.bam", chrom=chromosomeNames, gOneSam = groupOne),
rule artillumina_simulate_ref_group1:
input:
"reference/{chrom}.fasta"
output:
"simulated/{chrom}.{gOneSam}.1.fq",
"simulated/{chrom}.{gOneSam}.2.fq"
params:
name = "simulated/{chrom}.{gOneSam}."
shell:
"""
art_illumina -ss HS20 -f 9 -l 100 -p -m 360 -s 80 -na -i {input} -o {params.name}
"""
rule artillumina_simulate_ref_group2:
input:
"reference/{chrom}.fasta"
output:
"simulated/ref_{chrom}.{gTwoSam}.1.fq",
"simulated/ref_{chrom}.{gTwoSam}.2.fq"
params:
name = "simulated/ref_{chrom}.{gTwoSam}."
shell:
"""
art_illumina -ss HS20 -f 9 -l 100 -p -m 360 -s 80 -na -i {input} -o {params.name}
"""
rule botwie2_align_ref_group1:
input:
"simulated/ref_{chrom}.{gOneSam}.1.fq",
"simulated/ref_{chrom}.{gOneSam}.2.fq",
"reference/{chrom}.rev.1.bt2", "reference/{chrom}.rev.2.bt2",
"reference/{chrom}.1.bt2", "reference/{chrom}.2.bt2",
"reference/{chrom}.3.bt2", "reference/{chrom}.4.bt2"
params:
btindex = "reference/{chrom}"
output:
"aligned/ref_{chrom}.{gOneSam}.bam"
shell:
"""
bowtie2 -x {params.btindex} -1 {input[0]} -2 {input[1]} | samtools view -bS > {output}
"""
rule botwie2_align_ref_group2:
input:
"simulated/ref_{chrom}.{gTwoSam}.1.fq",
"simulated/ref_{chrom}.{gTwoSam}.2.fq",
"reference/{chrom}.rev.1.bt2", "reference/{chrom}.rev.2.bt2",
"reference/{chrom}.1.bt2", "reference/{chrom}.2.bt2",
"reference/{chrom}.3.bt2", "reference/{chrom}.4.bt2"
params:
btindex = "reference/{chrom}"
output:
"aligned/ref_{chrom}.{gTwoSam}.bam"
shell:
"""
bowtie2 -x {params.btindex} -1 {input[0]} -2 {input[1]} | samtools view -bS > {output}
"""
Running the snakemake gives me the following error:
Rules botwie2_align_ref_group1 and botwie2_align_ref_group2 are ambiguous for the file aligned/ref_CM008309.1.BAR_A.bam.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
botwie2_align_ref_group1: chrom=CM008309.1,gOneSam=BAR_A
botwie2_align_ref_group2: chrom=CM008309.1,gTwoSam=BAR_A
So it would seem that my wildcard gTwoSam overwrites the gOneSam one.
However, modifying the rule all to collect output of artillumina_simulate_ref_group like this:
rule all:
input:
expand("simulated/ref_{chrom}.{gOneSam}.1.fq",chrom=chromosomeNames,gOneSam = groupOne),
expand("simulated/ref_{chrom}.{gOneSam}.2.fq",chrom=chromosomeNames,gOneSam = groupOne),
expand("simulated/ref_{chrom}.{gTwoSam}.1.fq",chrom=chromosomeNames,gTwoSam = groupTwo),
expand("simulated/ref_{chrom}.{gTwoSam}.2.fq",chrom=chromosomeNames,gTwoSam = groupTwo),
And running it dry mode everything works.
So why are my wildcards overwriting one another.
All I can say is, you're trying to confuse snakemake as well as humans. Your rules are almost identical; the only difference is, FASTQ-output rules don't have input files mapped to ambiguous output wildcards while the BAM-output rules do. This is why snakemake is able to resolve the former and not the latter.
Do yourself a favor and remove the duplicates. Whatever you're trying to do with this approach can be done using a much better much less ambiguous approach.
Sure enough, when I remove rule
bowtie2_align_ref_group2everything works well enough - rule also generates samples specified ingroupTwo. I've extended the concept to theartillumina_simulate__ref_group. But my question is why does this happen. I was under the impression that specifying wildcards upstream in therule allconstrains samples - e.g.gOneSamcan only haveFOOvalues, andgTwoSamcan only haveBARvalues.The outputs of your simulation rules are much the same regarding snakemake. It can deduce from your rule all that first wildcards is indeed a chromosome but the second one can either be
{gOneSam}or{gTwoSam}. Indeed, a wildcard is not a global value, it is purely relative to your rule. Therefore it is not because 2 wildcards have the same name in two rules that they will share the same value. See them as unknown values that will be deduced from the outputs of a downstream rule (here your rule all).If you really do want to run it this way, I would say replacing these two wildcards by
BAR_{N}andFOO_{N}withN=[chr(alpha+97).capitalize() for alpha in range(0,num_of_samples)]should constrain the value of your wildcards.Just as a general remark, since your rules simulate and align are doing the exact same thing for your 2 groups, it would be easier to have only one rule by step, with a wildcard
{group}withgroup=groupOne+groupTwo.