Snakemake excuting branching jobs
1
0
Entering edit mode
9 months ago
yxwucq • 0

I want to apply different method to generate downstream file for same raw input, like:

  1. test.fq -> test_method_1.bam -> output_method_1.vcf -> downstream

  2. test.fq -> test_method_2.bam -> test_method_2.vcf -> downstream

I can define two seperate sets of snakemake rules. However, the downstream analysing methods are same, which means that I will copy those commands twice.

Or, I can copy the raw input to test_method_1.fq and test_method_2.fq, but they will waste some space and are not elegant.

So is there another way to solve this problem?

Pipeline Python Snakemake • 441 views
ADD COMMENT
2
Entering edit mode
9 months ago
Jesse ▴ 740

Think about it from the final output backwards; how can you indicate which method should be used to produce any particular downstream output? I'd suggest putting the method itself in the output filename, and then structuring your rules to work with that.

For example:

rule downstream:
    output: "downstream.{sample}_{method}.txt"
    input: "{sample}_{method}.vcf"

rule vcf:
    output: "{sample}_{method}.vcf"
    input: "{sample}_{method}.bam"

rule method_2:
    output: "{sample}_method_2.bam"
    input: "{sample}.fq"

rule method_1:
    output: "{sample}_method_1.bam"
    input: "{sample}.fq"

With that setup you could for example ask Snakemake for the output downstream.samp1_method_2.txt and it would go looking for samp1.fq in order to make samp1_method_2.bam, and so on. If the different methods are substantially different, having different rules makes sense, but if it just comes down to passing different arguments or something you could also just have a single rule and make it behave different depending on that wildcard.

ADD COMMENT

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6