Using 'expand' in snakemake target rule with constraints
1
0
Entering edit mode
7 days ago
cfos4698 ▴ 270

Dear all,

I'm working on a snakemake workflow where I need to use glob_wildcards (or something similar) to work on all samples in a directory.

SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample}_L001_R1_001.fastq.gz")).sample
wildcard_constraints:
sample="(?!Undet).*"

rule all:
input:
expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R1.fq.gz"), sample = SAMPLES),
expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R2.fq.gz"), sample = SAMPLES)


The input/output files for normal/work rules (sorry, don't know the proper name for them) populate correctly based on the {sample} wildcard. All rules finish as expected. However, I expect there to be some samples in the directory that I don't want. I can get around this by adding a global wildcards constraint at the beginning like so:

SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample}_L001_R1_001.fastq.gz")).sample
wildcard_constraints:
sample="(?!Undet).*"


However, the issue then is a 'MissingInputException':

Missing input files for rule all:
results2/fastp/Undetermined_S0/trimmed/Undetermined_S0_trimmed_R1.fq.gz
results2/fastp/Undetermined_S0/trimmed/Undetermined_S0_trimmed_R2.fq.gz


How can I change the expand function for the target rule input so that it behaves in the same way as the wildcards_constraint (i.e., ignoring files beginning with 'Undet')?

Thanks!

snakemake glob_wildcards expand • 212 views
2
Entering edit mode
7 days ago
yztxwd ▴ 490
SAMPLES = glob_wildcards(os.path.join(READSDIR,"{sample, (?!Undet).*}_L001_R1_001.fastq.gz")).sample

0
Entering edit mode

Thanks, works a charm. I'd gone down the rabbit hole of:

expand(os.path.join(RESULT_DIR, "fastp/{sample}/trimmed/{sample}_trimmed_R1.fq.gz"), sample = [x for x in SAMPLES if 'Undet' not in x])


And it seemed to work, but I prefer the neatness of yours.