Snakemake error
2
1
Entering edit mode
2.2 years ago
kamanovae ▴ 100

Hi!

I have a big pipeline based on Snakemake. Now I am getting the following error

NameError in line 120 of /storage1/GatkBwaTest/SnakemakeDir/snakefile3:
name 'data1' is not defined
  File "/storage1/GatkBwaTest/SnakemakeDir/snakefile3", line 120, in <module>

How can I make the date in the param understand that this is the date from the input

rule gatk_GatherBQSRReports:
        input:
                data1=expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt", interval=INTERVALS.interval, allow_missing=True)
        params:
                data1= '-I ' + ' -I '.join(data1), gatk_memory=config["gatk_memory"]
        log:
                gatherBQSRReports=config["out_logs"]+"/{sample}/{sample}.gatherBQSRReports.log"
        output:
                output1=config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt"
        shell:
                "{gatk} --java-options {params.gatk_memory}  GatherBQSRReports {params.data1} -O {output.output1} 2> {log.gatherBQSRReports}"
Snakemake gatk • 896 views
ADD COMMENT
2
Entering edit mode
2.2 years ago

I can think of a couple of workarounds, not tested but hopefully you get the idea.

One option is to build the data1 list outside the rule and use it in both the input and params directive:

mydata = expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt", interval=INTERVALS.interval, allow_missing=True)

rule gatk_GatherBQSRReports:
    input:
        data1= mydata
    params:
        data1= '-I ' + ' -I '.join(mydata), 
        gatk_memory=config["gatk_memory"],
    log:
        gatherBQSRReports=config["out_logs"]+"/{sample}/{sample}.gatherBQSRReports.log"
    output:
        output1=config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt"
    shell:
        "{gatk} --java-options {params.gatk_memory}  GatherBQSRReports {params.data1} -O {output.output1} 2> {log.gatherBQSRReports}"

The second option builds the params.data1 string in the run directive, so using usual python code, and pass it to the shell function:

rule gatk_GatherBQSRReports:
    input:
        data1= expand(...),
    params:
        gatk_memory=config["gatk_memory"],
    log:
        gatherBQSRReports=config["out_logs"]+"/{sample}/{sample}.gatherBQSRReports.log"
    output:
        output1=config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt"
    run:
        params_data = '-I ' + ' -I '.join(input.data1)

        cmd = "{gatk} --java-options {params.gatk_memory}  GatherBQSRReports %s -O {output.output1} 2> {log.gatherBQSRReports}" % params_data
        shell(cmd)

I think I prefer the second option since it keeps the rule more self-contained.

ADD COMMENT
0
Entering edit mode

Thank you!!! It really works! But now I have another error. I suspect I'm doing something wrong with allow_missing=True

Building DAG of jobs...
WildcardError in line 14 of /storage1/GatkBwaTest/SnakemakeDir/snakefile3:
Wildcards in input files cannot be determined from output files:
'sample'

I show the last two rules and the rule all

import os

(SAMPLES,) = glob_wildcards("../exom/NIST7035_TAAGGCGA/{sample}_L001_R1_001.fastq.gz")
INTERVALS = glob_wildcards("../SnakemakeInput/intervals/{interval}.bed")

SAMPLE_PATH=os.path.dirname(config["sample"])

gatk="/storage1/GatkBwaTest/bin/gatk-4.1.3.0/gatk"
bwa_mem="/storage1/GatkBwaTest/bin/bwa-mem2-2.2.1_x64-linux/bwa-mem2"
repair="/storage1/GatkBwaTest/bin/repair.sh"

rule all:
        input:
                expand(config["out"] + "/{sample}/{sample}_R1.fastq", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}_R2.fastq", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}_fastrepair_R1.fastq", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}_fastrepair_R2.fastq",  sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}_fastrepair_outs.fastq", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}.sam", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}.bam", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}.AddOrReplaceReadGroups.bam", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}.MarkDuplicates.bam", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}.MarkDuplicates.txt", sample=SAMPLES),
                expand(config["out"] + "/{sample}/{sample}.bedtools_genomecov.genome.covered.bed", sample=SAMPLES),
                expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt", interval=INTERVALS.interval, allow_missing=True),
                expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt", allow_missing=True)



rule gatk_BaseRecalibrator:
        input:
                data1=config["intervals"]+"{interval}.bed",
                data2=config["out"]+"/{sample}/{sample}.MarkDuplicates.bam", data3=config["ref"], data4=config["dbsnp"]
        output:
                output1=config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt"
        params:
                gatk_memory=config["gatk_memory"]
        shell:
                "{gatk} --java-options {params.gatk_memory}  BaseRecalibrator -L {input.data1} -I {input.data2} -O {output.output1} -R {input.data3} "
                " --known-sites {input.data4} "

rule gatk_GatherBQSRReports:
        input:
                data1=expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt", interval=INTERVALS.interval, allow_missing=True)
        params:
                gatk_memory=config["gatk_memory"]
        log:
                gatherBQSRReports=config["out_logs"]+"/{sample}/{sample}.gatherBQSRReports.log"
        output:
                output1=config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt"

        run:
                params_data = '-I ' + ' -I '.join(input.data1)

                cmd = "{gatk} --java-options {params.gatk_memory}  GatherBQSRReports %s -O {output.output1} 2> {log.gatherBQSRReports}" % params_data
                shell(cmd)
ADD REPLY
0
Entering edit mode

I think the problem is with rule all where you have:

expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.BaseRecalibrator.{interval}.txt", interval=INTERVALS.interval, allow_missing=True),
expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt", allow_missing=True)

Snakemake doesn't know what {sample} should be replaced with. I suspect that the first one can be removed altogether from rule all since these are intermediate files. The second should probably be:

expand(config["out_BaseRecalibrator"]+"/{sample}/{sample}.GatherBQSRReports.txt",  sample=SAMPLES)
ADD REPLY
0
Entering edit mode
2.2 years ago
raphael.B ▴ 520

This looks more like a python error than a snakemake error. I think you did not defined the data1 list from your join(data1).

ADD COMMENT

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6