Question

Snakemake excuting branching jobs

0

Entering edit mode

9 months ago

yxwucq • 0

I want to apply different method to generate downstream file for same raw input, like:

test.fq -> test_method_1.bam -> output_method_1.vcf -> downstream
test.fq -> test_method_2.bam -> test_method_2.vcf -> downstream

I can define two seperate sets of snakemake rules. However, the downstream analysing methods are same, which means that I will copy those commands twice.

Or, I can copy the raw input to test_method_1.fq and test_method_2.fq, but they will waste some space and are not elegant.

So is there another way to solve this problem?

Pipeline Python Snakemake • 441 views

ADD COMMENT • link updated 9 months ago by Jesse ▴ 740 • written 9 months ago by yxwucq • 0

score 2 · Accepted Answer · 2023-07-25

Think about it from the final output backwards; how can you indicate which method should be used to produce any particular downstream output? I'd suggest putting the method itself in the output filename, and then structuring your rules to work with that.

For example:

rule downstream:
    output: "downstream.{sample}_{method}.txt"
    input: "{sample}_{method}.vcf"

rule vcf:
    output: "{sample}_{method}.vcf"
    input: "{sample}_{method}.bam"

rule method_2:
    output: "{sample}_method_2.bam"
    input: "{sample}.fq"

rule method_1:
    output: "{sample}_method_1.bam"
    input: "{sample}.fq"

With that setup you could for example ask Snakemake for the output downstream.samp1_method_2.txt and it would go looking for samp1.fq in order to make samp1_method_2.bam, and so on. If the different methods are substantially different, having different rules makes sense, but if it just comes down to passing different arguments or something you could also just have a single rule and make it behave different depending on that wildcard.