Question: MissingInputException in snakemake workflow
0
gravatar for jmat
10 months ago by
jmat0
jmat0 wrote:

Hello all,
I'm testing this simple snakemake pipeline.

import pandas as pd
configfile: "config.yaml"

units = pd.read_csv(
    config["units"], dtype=str, sep="\t").set_index(["sample", "unit"], drop=False)

units.index = units.index.set_levels(
    [i.astype(str) for i in units.index.levels])

def get_fastqs(wildcards):
    """Get raw FASTQ files from unit sheet."""

    return units.loc[
        (wildcards.sample, wildcards.unit), ["fq1", "fq2"]].dropna()

rule all:
    input:
        expand('test/{unit.sample}-paired_reads', unit=units.itertuples())

rule getfastas:
    input:
        get_fastqs
    output:
        "test/{unit.sample}-paired_reads.txt"
    shell:
        "echo {input} > {output}"

and it fails with the following error:

MissingInputException in line 16 of path/test.smk:
Missing input files for rule all:
test/SRR3396382-paired_reads
test/SRR3396381-paired_reads

The pandas dataframe units looks like this:

pandas-df

And in the rule all, unit=units.itertuples() returns python named tuples where I get the value(string) from the column sample, so the SRR339638x in the missing files are coming from there.

In which way can I use get_fastqs to produce the missing files? I thing this is related to my misunderstanding of some, maybe basic, snakemake functionality.

snakemake • 829 views
ADD COMMENTlink modified 10 months ago by Medhat8.7k • written 10 months ago by jmat0

In the function that you are passing wildcards to the name of the variable is unit.sample not unit. So wildcards.sample, wildcards.unit does not exist.

ADD REPLYlink modified 10 months ago • written 10 months ago by Medhat8.7k

Hi Medhat, thanks for your time.

I don't understand what are you referring to. Can you elaborate a bit more?
As is understand it, unit.sample is a named tuple (valid python), so the variable is named unit, and has the attribute named sample, that's why in the rule all, the wildcard {unit.sample} produces the SRR339638x strings. To explain it better, it's a wildcard that has two 'named' attributes, "unit" and "sample".

I edited my original post, I'm posting a simpler example but is exactly the same scenario.

ADD REPLYlink modified 10 months ago • written 10 months ago by jmat0

My bad, It was not clear previously.

ADD REPLYlink written 10 months ago by Medhat8.7k
0
gravatar for Medhat
10 months ago by
Medhat8.7k
Texas
Medhat8.7k wrote:

Please change 'test/{unit.sample}-paired_reads', unit=units.itertuples().

The variable for expand is named unit.sample but you are calling it unit It should be : 'test/{unit.sample}-paired_reads', unit.sample=units.itertuples().

Also, the wildcard have a variable called unit.sample not just sample so to use it in the function it is now called wildcard.unit.sample.sample

ADD COMMENTlink modified 10 months ago • written 10 months ago by Medhat8.7k

when i do this:

expand('test/{unit.sample}-paired_reads', unit.sample=units.itertuples())

I got this error:

SyntaxError in line 18 of path/test.smk:
keyword can't be an expression

the expansion in the rule all it's fine, is working as expected as the snakemake error indicates, it's producing the expected files (strings). What happens is that itertuples() returns a named tuple and its attribute is sample, so that's why {unit.sample} works. the .sample part of unit.sample it's its attribute which is called sample. My problem is passing the proper wildcard to my input function in the rule getfastas

ADD REPLYlink modified 10 months ago • written 10 months ago by jmat0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 837 users visited in the last hour