Snakemake aggregate input from rule with wildcards
0
0
Entering edit mode
12 weeks ago

I have a series of rules, such as follows:

checkpoint barcode:
input: get_basecall_input
output:
data = directory(config["results"] + "barcode"),
complete = touch(config["results"] + ".temp/complete/barcode.complete")
params:
guppy_container=config["guppy_container"],
barcode_kit=config["barcode"]["kit"]
shell:
r"""
guppy_barcoder \
--input_path {input} \
--save_path {output.data} \
--barcode_kits {params.barcode_kit} \
--recursive
"""

def get_barcode_input(wildcards):
return glob.glob(config["results"] + f"barcode/{wildcards.barcode}/*.fastq")

rule merge_barcodes:
input: get_barcode_input
output: config["results"] + "barcode/{barcode}.merged.fastq"
params: barcode_folder = config["results"] + "barcode/{barcode}"
shell:
r"""
cat {input} > {output}
"""

def get_merged_barcodes(wildcards):
barcode_output = checkpoints.barcode.get(**wildcards).output[0]
return expand(config["results"] + "barcode/{barcode}.merged.fastq",
barcode=glob_wildcards(os.path.join(barcode_output, "/{barcode}/*.fastq")).barcode)

rule create_classified_unclassified_barcode:
input: get_merged_barcodes
output:
classified = config["results"] + ".temp/barcode.classified.merged.fastq",
unclassified = config["results"] + ".temp/barcode.unclassified.merged.fastq"
shell:
r"""
for file in {input}; do
if [[ "$file" =~ barcode[0-9]{{2}} ]]; then cat "$file" >> {output.classified}
elif [[ "$file" =~ unclassified ]]; then cat "$file" >> {output.unclassified}
fi
done
"""


However, I seem to be unable to get the final rule, create_classified_unclassified_barcode to work properly. I have tried with rule merge_barcodes, but then create_classified_unclassified_barcode runs immediately after rule barcode, the output from rule merge_barcodes is not taken as input, and nothing is done.

I have also tried using a checkpoint on rule merge_barcodes, but then I get errors that say Missing wildcard values for barcode, which makes sense because I am not using wildcards in create_classified_unclassified_barcode.

I have found this biostars link and this website that show something similar, but they're just different enough that I can't seem to get my own workflow to work. I feel the second link is basically the same exact thing as what I am trying to do. When I implement this (as I have done above), snakemake tries to Updating job 3 (create_classified_unclassified_barcode), and then no input is listed for the job once the workflow starts.

I appreciate any help I can get on this problem

snakemake • 286 views
0
Entering edit mode

I don't see any wildcards used in create_classified_unclassified_barcode, so how would get_merged_barcodes get a hold of one?

0
Entering edit mode

This is part of my issue. I'm trying to merge the output of merge_barcodes into two separate files. One to a "classified" output, and another to an "unclassified" output. I have updated create_classified_unclassified_barcode to show more clearly what I am trying to do

0
Entering edit mode

So you can't have a wildcard in the input without one to match it in the output. The input of create_classified_unclassified_barcode must be a fixed target - a list of all the merged barcode files.

0
Entering edit mode

OK, I think I've got something figured out for that, at least for now.

Is there a way for me to use merge_barcodes as a checkpoint, and then use checkpoints.merge_barcodes.get(**wildcards).output[0] in def get_merged_barcodes? Or will this not work because create_classified_unclassified_barcode will then have wildcards in the input, as it does now?