Question: loading yaml file into snakemake
1
gravatar for skbrimer
6 months ago by
skbrimer620
United States
skbrimer620 wrote:

Hello hive mind,

I'm having issues moving snakemake from one sample to multiple samples. I have gotten this workflow to work for processing amplicon targets and it worked great.

rule all:
    input:
        "data/samples/LS19-3433-21_s1_de_novo"

rule seqtk_qualtiy_filter:
    input:
        "data/samples/LS19-3433-21_s1.IonXpress_035.2019-09-04T20_14_16Z.fastq"
    output:
        temp("data/samples/LS19-3433-21_s1.qtrim.fastq")
    shell:
        "seqtk trimfq -b 0.01 {input} > {output}"

rule seqtk_clip:
    input:
        "data/samples/LS19-3433-21_s1.qtrim.fastq"
    output:
        "data/samples/LS19-3433-21_s1.clean.fastq"
    shell:
        "seqtk trimfq -b20 -L 350 {input} > {output}"

rule bbnorm:
    input:
        "data/samples/LS19-3433-21_s1.clean.fastq"
    output:
        "data/samples/LS19-3433-21_s1.norm.fastq"
    shell:
        "bbnorm.sh -Xmx10g in={input} out={output} target=50"

rule spades:
    input:
        "data/samples/LS19-3433-21_s1.norm.fastq"
    output:
        "data/samples/LS19-3433-21_s1_de_novo"
    shell:
        "spades.py --iontorrent --only-assembler -s {input} -k 21,33,55,77,99,127 -o {output}"

So now I'm trying to generalize it and the documentation says I can you a config.yaml to list my samples. Which I have done

samples:
    LS19-3512-1: data/LS19-3512-1.IonXpress_060.2019-11-01T13_49_02Z.fastq
    LS19-3512-2: data/LS19-3512-2.IonXpress_061.2019-11-01T13_49_02Z.fastq
    LS19-3512-3: data/LS19-3512-3.IonXpress_062.2019-11-01T13_49_02Z.fastq
    LS19-3512-5: data/LS19-3512-5.IonXpress_063.2019-11-01T13_49_02Z.fastq
    LS19-3512-6: data/LS19-3512-6.IonXpress_064.2019-11-01T13_49_02Z.fastq
    LS19-3512-8: data/LS19-3512-8.IonXpress_085.2019-11-01T13_49_02Z.fastq
    LS19-3512-9: data/LS19-3512-9.IonXpress_086.2019-11-01T13_49_02Z.fastq

Now I have tried to load the samples in from the yaml file like the documentation shows. like so...

configfile: "config.yaml"

print("Starting amplicon analysis workflow")

rule seqtk_qualtiy_filter:
    input:
        expand("{sample}", sample=config["samples"])
    output:
        temp("data/{sample}.qtrim.fq")
    shell:
        "seqtk trimfq -b 0.01 {input} > {output}"

and get the following error

sean@LEN943:~/Desktop/cobb_reo_3512$ snakemake -s amplicon_snakefile -np
Starting amplicon analysis workflow
Building DAG of jobs...
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.

I have also tried the lambda wildcards: config["samples"][wildcards.sample] and get the same error

I'm sure that this is an easy fix but I'm just not understanding how to do it as using the yaml/json config files will always require a wildcard/variable.

Thank you in advance for pointing me in the right direction.

Sean

assembly software error • 525 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by skbrimer620
1

For me, this question is fine here, but the developer of snakemake prefers stackoverflow.

ADD REPLYlink written 6 months ago by WouterDeCoster43k

Thank you for the feedback Wouter! I will give this a try when I escape the lab today

ADD REPLYlink written 6 months ago by skbrimer620
2
gravatar for skbrimer
6 months ago by
skbrimer620
United States
skbrimer620 wrote:

Here is my solution to this and I hope it helps other people who are just starting with Snakemake. Thank you again Wouter for the input and examples!

configfile: "config.yaml"

print("Starting amplicon analysis workflow")

rule all:
    input:
        expand("data/{sample}.clean.fastq", sample = config["samples"])

rule seqtk_qtrim:
    input:
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        temp("data/{sample}.qtrim.fastq")
    shell:
        "seqtk trimfq -q 0.01 {input} > {output}"

rule seqtk_clip:
    input:
        "data/{sample}.qtrim.fastq"
    output:
        temp("data/{sample}.clip.fastq")
    shell:
        "seqtk trimfq -b 20 -L 350 {input} > {output}"

rule seqtk_lengthfilter:
    input:
        "data/{sample}.qtrim.fastq"
    output:
        "data/{sample}.clean.fastq"
    shell:
        "seqtk seq -L 50 {input} > {output}"
ADD COMMENTlink written 6 months ago by skbrimer620
2
gravatar for WouterDeCoster
6 months ago by
Belgium
WouterDeCoster43k wrote:

I'm not sure if it's the best way, but the following works for me. I use a function to get the samples:

def get_samples(wildcards):
    return config["samples"][wildcards.sample]

My first rule has as input that function, but just the function name and not the function call. To get the genome it seems to work fine when I just use the config dictionary.

rule minimap2_align:
    input:
        fq = get_samples,
        genome = config["genome"]
    output:
        "minimap2/alignment/{sample}.bam"
    params:
        sample = "{sample}"
    threads:
        8
    log:
        "logs/minimap2/{sample}.log"
    shell:
        """
        minimap2 --MD -ax map-ont -t {threads} \
         -R "@RG\\tID:{params.sample}\\tSM:{params.sample}" \
         {input.genome} {input.fq}/*.fastq.gz | \
         samtools sort -@ {threads} -o {output} - 2> {log}
        """

That's a workflow I have hacked together, so I don't know how far it is from best practices.

ADD COMMENTlink written 6 months ago by WouterDeCoster43k
1

So it has taken a day but I did get it to work and I think I understand why it works now as well. The config.yaml file gets imported in a python dictionary, but the format for the sample location makes it a nest dictionary. So you have to use the expand("{sample}", sample = config["samples"]) notation as the input for your target file in the rule all location which makes it so I can then use the lambda wildcards: config["samples"][wildcards.sample] in the first pre-process step to access the nested dictionary value.

ADD REPLYlink written 6 months ago by skbrimer620

That seems like a good solution! Thanks for sharing.

ADD REPLYlink written 6 months ago by WouterDeCoster43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 839 users visited in the last hour