Snakemake aggregate input from rule with wildcards
0
0
Entering edit mode
3.2 years ago

I have a series of rules, such as follows:

checkpoint barcode:
    input: get_basecall_input
    output:
        data = directory(config["results"] + "barcode"),
        complete = touch(config["results"] + ".temp/complete/barcode.complete")
    params:
        guppy_container=config["guppy_container"],
        barcode_kit=config["barcode"]["kit"]
    shell:
        r"""
        guppy_barcoder \
        --input_path {input} \
        --save_path {output.data} \
        --barcode_kits {params.barcode_kit} \
        --recursive
        """

def get_barcode_input(wildcards):
    return glob.glob(config["results"] + f"barcode/{wildcards.barcode}/*.fastq")

rule merge_barcodes:
    input: get_barcode_input
    output: config["results"] + "barcode/{barcode}.merged.fastq"
    params: barcode_folder = config["results"] + "barcode/{barcode}"
    shell:
        r"""
        cat {input} > {output}
        """

def get_merged_barcodes(wildcards):
    barcode_output = checkpoints.barcode.get(**wildcards).output[0]
    return expand(config["results"] + "barcode/{barcode}.merged.fastq",
        barcode=glob_wildcards(os.path.join(barcode_output, "/{barcode}/*.fastq")).barcode)

rule create_classified_unclassified_barcode:
    input: get_merged_barcodes
    output:
        classified = config["results"] + ".temp/barcode.classified.merged.fastq",
        unclassified = config["results"] + ".temp/barcode.unclassified.merged.fastq"
    shell:
        r"""
        for file in {input}; do
            if [[ "$file" =~ barcode[0-9]{{2}} ]]; then
                cat "$file" >> {output.classified}
            elif [[ "$file" =~ unclassified ]]; then
                cat "$file" >> {output.unclassified}
            fi
        done
        """

However, I seem to be unable to get the final rule, create_classified_unclassified_barcode to work properly. I have tried with rule merge_barcodes, but then create_classified_unclassified_barcode runs immediately after rule barcode, the output from rule merge_barcodes is not taken as input, and nothing is done.

I have also tried using a checkpoint on rule merge_barcodes, but then I get errors that say Missing wildcard values for barcode, which makes sense because I am not using wildcards in create_classified_unclassified_barcode.

I have found this biostars link and this website that show something similar, but they're just different enough that I can't seem to get my own workflow to work. I feel the second link is basically the same exact thing as what I am trying to do. When I implement this (as I have done above), snakemake tries to Updating job 3 (create_classified_unclassified_barcode), and then no input is listed for the job once the workflow starts.

I appreciate any help I can get on this problem

snakemake • 2.2k views
ADD COMMENT
0
Entering edit mode

I don't see any wildcards used in create_classified_unclassified_barcode, so how would get_merged_barcodes get a hold of one?

ADD REPLY
0
Entering edit mode

This is part of my issue. I'm trying to merge the output of merge_barcodes into two separate files. One to a "classified" output, and another to an "unclassified" output. I have updated create_classified_unclassified_barcode to show more clearly what I am trying to do

ADD REPLY
0
Entering edit mode

So you can't have a wildcard in the input without one to match it in the output. The input of create_classified_unclassified_barcode must be a fixed target - a list of all the merged barcode files.

ADD REPLY
0
Entering edit mode

OK, I think I've got something figured out for that, at least for now.

Is there a way for me to use merge_barcodes as a checkpoint, and then use checkpoints.merge_barcodes.get(**wildcards).output[0] in def get_merged_barcodes? Or will this not work because create_classified_unclassified_barcode will then have wildcards in the input, as it does now?

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6