How to reference input variable in output section of rule of Snakemake rule
1
0
Entering edit mode
11 weeks ago

I have a function def get_ref_name and a rule called bioawk on my snakemake script.

def get_ref_name(ref_genome_path):
       file_name = os.path.basename(ref_genome_path)
       ref_name = file_name.split('.')[0]
       return ref_name

rule all:
    input: 
        bioawk_ref_size_file = expand("results/{prefix}/bioawk/{ref_name}.size", prefix=PREFIX, ref_name=get_ref_name(config["reference_genome"]))            

rule bioawk:
    input:
        ref_name=get_ref_name(config["reference_genome"]),
        ref_genome=config["reference_genome"]
    output:
        reference_size_file=f"results/{{prefix}}/bioawk/{{ref_name}}.size"
    conda:
        "envs/bioawk.yaml"
    shell:
        "bioawk -c fastx '{{ print $name, length($seq) }}' < {input.ref_genome} > {output.reference_size_file}"

I am trying to get the name of the reference genome from a path specified on my config file. So for example, if the path leads to a KPNIH1.fasta file the output of the function is supposed to be KPNIH1. I am trying to call the output from the get_ref_name function in the input section as ref_name and calling it again in the output section {{ref_name}}. However, I am getting this error:

MissingInputException in line 320 of /.../snpkit-snakemake-test/snpkit_v3.smk:
Missing input files for rule bioawk:
KPNIH1

I assume there may be an issue with the way i am calling the function in the input section but not sure how to fix it. Thanks in advance for your help!

python snakemake • 375 views
ADD COMMENT
0
Entering edit mode

If both your "inputs" are from a config file, are they really inputs? You could place them in params, because inputs are expected to be files.

ADD REPLY
0
Entering edit mode
11 weeks ago

a lookup function helps Snakemake identify the input it needs to find to generate an output using a wildcard term, when the lookup is complex. In your case it is not complex, it's just the output has a .fasta and the input doesn't.

also you are confusing python f-strings with snakemake wildcards

input:
   os.path.join(config['reference_genome'],"{ref_name_stem}")
output:
   "results/{prefix}/bioawk/{ref_name_stem}.fasta"
ADD COMMENT
0
Entering edit mode

The output has a .size file and I would like to have the reference genome name (ref_name) be part of the file name in the output (ref_name.size). The config['reference_genome'] points to a filepath as a string (/.../variant_calling_bin/reference/KPNIH1/KPNIH1.fasta) so I would like to extract just the KPNIH1 from KPNIH1.fasta and have it as

output:
   "results/{prefix}/bioawk/{ref_name}.size" # KPNIH1.size
ADD REPLY
0
Entering edit mode

you don't need a lookup function if there is a simple relationship between your input and output filenames

ADD REPLY

Login before adding your answer.

Traffic: 1731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6