2
2
Entering edit mode
16 months ago
kat ▴ 40

Hi, I am creating a snakemake pipeline and am having trouble adding an additional wildcard for using different filters. My first rule filters variants and produces two output VCF files, with different filters applied (qfilt or qfiltreg). I would like the second rule to consider {filter} as a wildcard and produce FASTA files for each of these filtered VCFs in parallel. I am a bit confused how to add this wildcard because I run Filter Variants only once to produce two VCFs. Thank you in advance! Best,

# Filter variants.
rule filter_vars:
input:
ref_path='{ref}.fa',
vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_deep.g.vcf.gz',
log:
'results/{samp}/logs/{samp}_{mapper}_{ref}_filt_vcf_stats.txt'
output:
filt_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_qfilt.vcf.gz',
noppe_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_qfiltreg.vcf.gz'
shell:
"scripts/filter_vars.sh {input.ref_path} {input.vcf} {output.filt_vcf} > {log}"

# Convert single_sample VCF to fasta.
rule vcf_to_fasta:
input:
ref_path='{ref}.fa',
filt_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_{filter}.vcf.gz'
output:
fasta='results/{samp}/fasta/{samp}_{mapper}_{ref}_{filter}.fa'
shell:
"scripts/vcf2fasta.sh {input.ref_path} {input.filt_vcf} {output.fasta}"

snakemake wildcards • 1.5k views
1
Entering edit mode
16 months ago
kat ▴ 40

Thank you for the quick reply! Yes, I have tried that, unsuccessfully. Is there something obvious I'm missing?

# List of parameters - move to config file
SAMPLES=['T1-XX-2017-1068_S51','L-14-54_S56']
REFS = ['H37Rv']
MAPPERS= ['bowtie2']
FILTERS = ['qfilt','qnoppe']

# Define target files.
rule all:
input:
fastas=expand('results/{samp}/fasta/{samp}_{mapper}_{ref}_{filter}.fa',samp=SAMPLES, ref=REFS, mapper=MAPPERS, filter=FILTERS)

# Filter variants.
rule filter_vars:
input:
ref_path='{ref}.fa',
vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_deep.g.vcf.gz',
log:
'results/{samp}/logs/{samp}_{mapper}_{ref}_filt_vcf_stats.txt'
output:
filt_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_qfilt.vcf.gz',
noppe_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_qnoppe.vcf.gz'
shell:
"scripts/filter_vars.sh {input.ref_path} {input.vcf} {output.filt_vcf} > {log}"

# Convert single_sample VCF to fasta.
rule vcf_to_fasta:
input:
ref_path='{ref}.fa',
filt_vcf='results/{samp}/vars/{samp}_{mapper}_{ref}_{filter}.vcf.gz'
output:
fasta='results/{samp}/fasta/{samp}_{mapper}_{ref}_{filter}.fa'
shell:
"scripts/vcf2fasta.sh {input.ref_path} {input.filt_vcf} {output.fasta}"


I get the same error that I'm missing an input file:

Building DAG of jobs...
MissingInputException in line 44 of /oak/stanford/scg/lab_jandr/walter/tb/test/Snakefile:
Missing input files for rule all:
results/T1-XX-2017-1068_S51/fasta/T1-XX-2017-1068_S51_bowtie2_H37Rv_qfilt.fa
results/T1-XX-2017-1068_S51/fasta/T1-XX-2017-1068_S51_bowtie2_H37Rv_qnoppe.fa
results/L-14-54_S56/fasta/L-14-54_S56_bowtie2_H37Rv_qfilt.fa
results/L-14-54_S56/fasta/L-14-54_S56_bowtie2_H37Rv_qnoppe.fa

0
Entering edit mode

well I'm not crazy about you using underscores in both the sample names and as delimiters. try using a delimiter that won't confuse the regex. also just update your original question, you don't need to submit an answer.

0
Entering edit mode

Okay, thank you for the feedback about the sample names, good point! And re: the original question: I had an issue with incorrect wildcard constraints on the filter argument. Issue solved - thank you again!

0
Entering edit mode
16 months ago

OK the wildcard rules look fine, now create a target rule to produce the files you want

rule mytargetrule:
input: "results/mysample/fasta/mysample_bwa_hg37_qfilt.fa",
"results/mysample/fasta/mysample_bwa_hg37_qfiltreg.fa"