How to submit 60 bed files for one rule in Snakemake as an input file
2
1
Entering edit mode
3.0 years ago
kamanovae ▴ 100

I am using snakelike to describe the GATK pipeline. Right now my code looks like this:


SAMPLES = glob_wildcards("input_bam/{sample}.bam")


rule all:
    input: expand("out/{sample}.AddOrReplaceReadGroups.bam", sample = SAMPLES.sample), expand("out/{sample}.MarkDuplicates.txt", sample = SAMPLES.sample), expand("out/{sample}.MarkDuplicates.bam", sample = SAMPLES.sample), expand("out/{sample}.BaseRecalibrator.txt", sample = SAMPLES.sample)

rule gatk_AddOrReplaceReadGroups:
    input: "input_bam/{sample}.bam"
    output: "out/{sample}.AddOrReplaceReadGroups.bam"
    shell: "./gatk-4.1.9.0/gatk --java-options ""-Xmx30g""  AddOrReplaceReadGroups -I {input}  -O {output} -ID group1 -SM NORMAL -PL illumina -LB lib1 -PU unit1"

rule gatk_MarkDuplicates:
    input: rules.gatk_AddOrReplaceReadGroups.output
    output: output1="out/{sample}.MarkDuplicates.bam", output2="out/{sample}.MarkDuplicates.txt"
    shell: "./gatk-4.1.9.0/gatk --java-options  ""-Xmx4g""  MarkDuplicates  -I {input}  -O {output.output1} -M {output.output2} --CREATE_INDEX true"

rule gatk_BaseRecalibrator:
    input:  input1="intervals/0000-scattered.bed", input2=rules.gatk_MarkDuplicates.output.output1, input3='ref/ref.fa', input4='dbsnp/dbsnp_150.hg38.vcf.gz'
    output: output1="out/{sample}.BaseRecalibrator.txt"
    shell: "./gatk --java-options ""-Xmx7680m""  BaseRecalibrator -L {input1} -I {input2} -O {output.output1} -R {input3} --known-sites {input4}"

instead of one file in the gatk_BaseRecalibrator.output1(intervals/0000-scattered.bed) at the entrance, I want to use all the files that I mess up in the intervals folder. How can I do this correctly?

gatk snakemake • 921 views
ADD COMMENT
1
Entering edit mode
3.0 years ago

Maybe what you want to collect all the *scattered.bed files in directory intervals and pass them to gatk -L. This could do it:

import glob

...

rule gatk_BaseRecalibrator:
    input:  
        input1= glob.glob("intervals/*-scattered.bed"),
        input2= rules.gatk_MarkDuplicates.output.output1, 
        input3= 'ref/ref.fa', input4='dbsnp/dbsnp_150.hg38.vcf.gz'
    output: 
        output1= "out/{sample}.BaseRecalibrator.txt"
    shell: 
        "bgatk --java-options ""-Xmx7680m""  BaseRecalibrator -L {input1} -I {input2} -O {input3} -R {input3} --known-sites {input4}"

It may be that gatk needs a -L for each file, like -L file1.bed -L file2.bed .... If so, try:


intervals = glob.glob("intervals/*-scattered.bed")

rule gatk_BaseRecalibrator:
    input:  
        input1= intervals,
        input2= rules.gatk_MarkDuplicates.output.output1, 
        input3= 'ref/ref.fa', input4='dbsnp/dbsnp_150.hg38.vcf.gz'
    output: 
        output1= "out/{sample}.BaseRecalibrator.txt",
    params:
        intervals= ' -L '.join(intervals),
    shell: 
        "bgatk --java-options ""-Xmx7680m""  BaseRecalibrator -L {params.intervals} -I {input2} -O {input3} -R {input3} --known-sites {input4}"
ADD COMMENT

Login before adding your answer.

Traffic: 2445 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6