Snakemake wildcards overwriting one another
0
0
Entering edit mode
21 months ago
Ivan ▴ 60

I'm including a snipped of my code.

The following block generates two groups, I'm setting the the number of samples to 1 for easier debugging. The samples in groupOne would be [FOO_A] and samples in groupTwo would be [BAR_A]

num_of_samples = 1
groupOne = ["FOO_" + chr(alpha+97).capitalize() for alpha in range(0,num_of_samples)]
groupTwo = ["BAR_" + chr(alpha+97).capitalize() for alpha in range(0,num_of_samples)]

Below is my first rule (rule all) and two rules generating my reads, and two rules generating bowtie2 alignment (the index is prebuilt and not included here)

rule all:
    input:
        expand("aligned/ref_{chrom}.{gTwoSam}.bam", chrom=chromosomeNames, gTwoSam = groupTwo),
        expand("aligned/ref_{chrom}.{gOneSam}.bam", chrom=chromosomeNames, gOneSam = groupOne),

rule artillumina_simulate_ref_group1:
    input:
        "reference/{chrom}.fasta"
    output:
        "simulated/{chrom}.{gOneSam}.1.fq",
        "simulated/{chrom}.{gOneSam}.2.fq"
    params:
        name = "simulated/{chrom}.{gOneSam}."
    shell:
        """
        art_illumina -ss HS20 -f 9 -l 100 -p -m 360 -s 80 -na -i {input} -o {params.name}
        """

rule artillumina_simulate_ref_group2:
    input:
        "reference/{chrom}.fasta"
    output:
        "simulated/ref_{chrom}.{gTwoSam}.1.fq",
        "simulated/ref_{chrom}.{gTwoSam}.2.fq"
    params:
        name = "simulated/ref_{chrom}.{gTwoSam}."
    shell:
        """
        art_illumina -ss HS20 -f 9 -l 100 -p -m 360 -s 80 -na -i {input} -o {params.name}
        """

rule botwie2_align_ref_group1:
    input:
        "simulated/ref_{chrom}.{gOneSam}.1.fq",
        "simulated/ref_{chrom}.{gOneSam}.2.fq",
        "reference/{chrom}.rev.1.bt2", "reference/{chrom}.rev.2.bt2",
        "reference/{chrom}.1.bt2", "reference/{chrom}.2.bt2",
        "reference/{chrom}.3.bt2", "reference/{chrom}.4.bt2"
    params:
        btindex = "reference/{chrom}"
    output:
        "aligned/ref_{chrom}.{gOneSam}.bam"
    shell:
        """
        bowtie2 -x {params.btindex} -1 {input[0]} -2 {input[1]} | samtools view -bS > {output}
        """

rule botwie2_align_ref_group2:
    input:
        "simulated/ref_{chrom}.{gTwoSam}.1.fq",
        "simulated/ref_{chrom}.{gTwoSam}.2.fq",
        "reference/{chrom}.rev.1.bt2", "reference/{chrom}.rev.2.bt2",
        "reference/{chrom}.1.bt2", "reference/{chrom}.2.bt2",
        "reference/{chrom}.3.bt2", "reference/{chrom}.4.bt2"
    params:
        btindex = "reference/{chrom}"
    output:
        "aligned/ref_{chrom}.{gTwoSam}.bam"
    shell:
        """
        bowtie2 -x {params.btindex} -1 {input[0]} -2 {input[1]} | samtools view -bS > {output}
        """

Running the snakemake gives me the following error:

Rules botwie2_align_ref_group1 and botwie2_align_ref_group2 are ambiguous for the file aligned/ref_CM008309.1.BAR_A.bam.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
        botwie2_align_ref_group1: chrom=CM008309.1,gOneSam=BAR_A
        botwie2_align_ref_group2: chrom=CM008309.1,gTwoSam=BAR_A

So it would seem that my wildcard gTwoSam overwrites the gOneSam one.

However, modifying the rule all to collect output of artillumina_simulate_ref_group like this:

rule all:
input:
    expand("simulated/ref_{chrom}.{gOneSam}.1.fq",chrom=chromosomeNames,gOneSam = groupOne),
    expand("simulated/ref_{chrom}.{gOneSam}.2.fq",chrom=chromosomeNames,gOneSam = groupOne),
    expand("simulated/ref_{chrom}.{gTwoSam}.1.fq",chrom=chromosomeNames,gTwoSam = groupTwo),
    expand("simulated/ref_{chrom}.{gTwoSam}.2.fq",chrom=chromosomeNames,gTwoSam = groupTwo),

And running it dry mode everything works.

So why are my wildcards overwriting one another.

snakemake • 838 views
ADD COMMENT
0
Entering edit mode

All I can say is, you're trying to confuse snakemake as well as humans. Your rules are almost identical; the only difference is, FASTQ-output rules don't have input files mapped to ambiguous output wildcards while the BAM-output rules do. This is why snakemake is able to resolve the former and not the latter.

Do yourself a favor and remove the duplicates. Whatever you're trying to do with this approach can be done using a much better much less ambiguous approach.

ADD REPLY
0
Entering edit mode

Sure enough, when I remove rule bowtie2_align_ref_group2 everything works well enough - rule also generates samples specified in groupTwo. I've extended the concept to the artillumina_simulate__ref_group. But my question is why does this happen. I was under the impression that specifying wildcards upstream in the rule all constrains samples - e.g. gOneSam can only have FOO values, and gTwoSam can only have BAR values.

ADD REPLY
1
Entering edit mode

The outputs of your simulation rules are much the same regarding snakemake. It can deduce from your rule all that first wildcards is indeed a chromosome but the second one can either be {gOneSam} or {gTwoSam}. Indeed, a wildcard is not a global value, it is purely relative to your rule. Therefore it is not because 2 wildcards have the same name in two rules that they will share the same value. See them as unknown values that will be deduced from the outputs of a downstream rule (here your rule all).

If you really do want to run it this way, I would say replacing these two wildcards by BAR_{N} and FOO_{N} with N=[chr(alpha+97).capitalize() for alpha in range(0,num_of_samples)] should constrain the value of your wildcards.

Just as a general remark, since your rules simulate and align are doing the exact same thing for your 2 groups, it would be easier to have only one rule by step, with a wildcard {group} with group=groupOne+groupTwo.

ADD REPLY

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6