Error:
Building DAG of jobs...
WildcardErrorin line 502 of /path/to/pipeline/workflow/Snakefile.py:
Wildcards in input files cannot be determined from output files:
'anc_r'
Code:
import os
import json
from datetime import datetime
from glob import iglob
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Define Constants ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# discover input files using path from run config
SAMPLES = list(set(glob_wildcards(f"{config['fastq_dir']}/{{sample}}_R1_001.fastq.gz").sample))
# read output dir path from run config
OUTPUT_DIR = f"{config['output_dir']}"
# Project name and date for bam header
SEQID='pipeline_align'
config['anc_r'] = list(set(glob_wildcards(f"{config['anc_dir']}/{{anc_r}}_R1_001.fastq.gz").anc_r))
# https://snakemake.readthedocs.io/en/v7.14.0/tutorial/basics.html#step-7-adding-a-target-rule
rule all:
input:
f'{OUTPUT_DIR}/DONE.txt'
rule list_samples:
output:
f"{OUTPUT_DIR}/00_logs/00_sample_list.txt"
shell:
"echo -e '{}' > {{output}}".format('\n'.join(SAMPLES))
rule align_reads_anc:
input:
rules.create_bwa_index.output,
R1=lambda wildcards: f"{config['anc_r']}/{anc_r}_R1_001.fastq.gz",
R2=lambda wildcards: f"{config['anc_r']}/{anc_r}_R2_001.fastq.gz",
output:
bam=f"{OUTPUT_DIR}/03_init_alignment/{{anc_r}}/{{anc_r}}_R1R2_sort.bam",
conda:
'envs/main.yml'
shell:
r"""bwa mem -R '@RG\tID:""" + SEQID + r"""\tSM:""" + '{wildcards.anc_r}' + r"""\tLB:1'""" + ' {rules.copy_fasta.output} {input.R1} {input.R2} | samtools sort -o {output.bam} - && samtools index {output.bam}'
rule samtools_index_one_anc:
input:
rules.align_reads_anc.output
output:
f"{OUTPUT_DIR}/03_init_alignment/{{anc_r}}/{{anc_r}}_R1R2_sort.bam.bai"
conda:
'envs/main.yml'
shell:
"samtools index {input}"
rule samtools_flagstat_anc:
input:
bam=rules.align_reads_anc.output,
idx=rules.samtools_index_one_anc.output
output:
f"{OUTPUT_DIR}/03_init_alignment/{{anc_r}}/{{anc_r}}_R1R2_sort_flagstat.txt"
conda:
'envs/main.yml'
shell:
"samtools flagstat {input.bam} > {output}"
rule picard_mark_dupes_anc:
input:
bam=rules.align_reads_anc.output,
idx=rules.samtools_index_one_anc.output,
dct=rules.create_ref_dict.output
output:
bam=f"{OUTPUT_DIR}/04_picard/{{anc_r}}/{{anc_r}}_comb_R1R2.MD.bam",
metrics=f"{OUTPUT_DIR}/04_picard/{{anc_r}}/{{anc_r}}_comb_R1R2.sort_dup_metrix"
conda:
'envs/main.yml'
shell:
"picard MarkDuplicates --INPUT {input.bam} --OUTPUT {output.bam} --METRICS_FILE {output.metrics} --REMOVE_DUPLICATES true --VALIDATION_STRINGENCY LENIENT"
rule picard_read_groups_anc:
input:
rules.picard_mark_dupes_anc.output.bam
output:
f"{OUTPUT_DIR}/04_picard/{{anc_r}}/{{anc_r}}_comb_R1R2.RG.MD.sort.bam",
conda:
'envs/main.yml'
shell:
f"picard AddOrReplaceReadGroups --INPUT {{input}} --OUTPUT /dev/stdout --RGID {SEQID} --RGLB 1 --RGPU 1 --RGPL illumina --RGSM {{wildcards.anc_r}} --VALIDATION_STRINGENCY LENIENT | samtools sort -o {{output}} -"
rule samtools_index_two_anc:
input:
rules.picard_read_groups_anc.output
output:
f"{OUTPUT_DIR}/04_picard/{{anc_r}}/{{anc_r}}_comb_R1R2.RG.MD.sort.bam.bai",
conda:
'envs/main.yml'
shell:
"samtools index {input}"
rule gatk_register:
input:
f'workflow/envs/src/GenomeAnalysisTK-3.7-0-gcfedb67.tar.bz2'
output:
f'{OUTPUT_DIR}/05_gatk/gatk_3.7_registered.txt'
conda:
'envs/main.yml'
shell:
"gatk-register {input} > {output}"
rule gatk_realign_targets_anc:
input:
fa=rules.copy_fasta.output,
bam=rules.picard_read_groups_anc.output,
idx=rules.samtools_index_two_anc.output,
gatk=rules.gatk_register.output,
faidx=rules.index_fasta.output
output:
f'{OUTPUT_DIR}/05_gatk/{{anc_r}}/{{anc_r}}_comb_R1R2.bam.intervals'
conda:
'envs/main.yml'
shell:
"GenomeAnalysisTK -T RealignerTargetCreator -R {input.fa} -I {input.bam} -o {output}"
rule gatk_realign_indels_anc:
input:
fa=rules.copy_fasta.output,
intervals=rules.gatk_realign_targets_anc.output,
bam=rules.picard_read_groups_anc.output,
idx=rules.samtools_index_two_anc.output
output:
bam=f'{OUTPUT_DIR}/05_gatk/{{anc_r}}/{{anc_r}}_comb_R1R2.RG.MD.realign.bam',
bai=f'{OUTPUT_DIR}/05_gatk/{{anc_r}}/{{anc_r}}_comb_R1R2.RG.MD.realign.bai'
conda:
'envs/main.yml'
shell:
"GenomeAnalysisTK -T IndelRealigner -R {input.fa} -I {input.bam} -targetIntervals {input.intervals} -o {output.bam}"
rule samtools_sort_three_anc:
input:
rules.gatk_realign_indels_anc.output.bam
output:
f'{OUTPUT_DIR}/05_gatk/{{anc_r}}/{{anc_r}}_comb_R1R2.RG.MD.realign.sort.bam',
conda:
'envs/main.yml'
shell:
"samtools sort {input} -o {output}"
rule samtools_index_three_anc:
input:
rules.samtools_sort_three_anc.output
output:
f'{OUTPUT_DIR}/05_gatk/{{anc_r}}/{{anc_r}}_comb_R1R2.RG.MD.realign.sort.bam.bai',
conda:
'envs/main.yml'
shell:
"samtools index {input}"
rule index_ancestor_bam:
input:
rules.samtools_sort_three_anc.output
output:
f"{rules.samtools_sort_three_anc.output}.bai"
conda:
'envs/main.yml'
shell:
"samtools index {input}"
rule bcftools_pileup_anc:
input:
bam=rules.samtools_sort_three_anc.output,
idx=rules.samtools_index_three_anc.output
output:
f'{OUTPUT_DIR}/06_variant_calling/{{anc_r}}/{{anc_r}}_samtools_AB.vcf',
conda:
'envs/main.yml'
shell:
"bcftools mpileup --ignore-RG -Ou -ABf {rules.copy_fasta.output} {input.bam} | bcftools call -vmO v -o {output}"
rule lofreq_anc:
input:
bam=rules.samtools_sort_three_anc.output,
idx=rules.samtools_index_three_anc.output,
ancidx=rules.index_ancestor_bam.output,
output:
normal=f'{OUTPUT_DIR}/06_variant_calling/{{anc_r}}/{{anc_r}}_lofreq_normal_relaxed.vcf.gz',
tumor=f'{OUTPUT_DIR}/06_variant_calling/{{anc_r}}/{{anc_r}}_lofreq_tumor_relaxed.vcf.gz',
somatic=f'{OUTPUT_DIR}/06_variant_calling/{{anc_r}}/{{anc_r}}_lofreq_somatic_final.snvs.vcf.gz',
conda:
'envs/main.yml'
shell:
f"lofreq somatic -n {{input.bam}} -t {{input.bam}} -f {{rules.copy_fasta.output}} -o {OUTPUT_DIR}/06_variant_calling/{{wildcards.anc_r}}/{{wildcards.anc_r}}_lofreq_"
rule unzip_lofreq_anc:
input:
normal=rules.lofreq_anc.output.normal,
tumor=rules.lofreq_anc.output.tumor,
somatic=rules.lofreq_anc.output.somatic,
output:
normal=rules.lofreq_anc.output.normal.replace('.vcf.gz', '.vcf'),
tumor=rules.lofreq_anc.output.tumor.replace('.vcf.gz', '.vcf'),
somatic=rules.lofreq_anc.output.somatic.replace('.vcf.gz', '.vcf'),
conda:
'envs/main.yml'
shell:
"bgzip -d {input.normal} && bgzip -d {input.tumor} && bgzip -d {input.somatic}"
rule align_reads:
input:
rules.create_bwa_index.output,
r1=f"{config['fastq_dir']}/{{sample}}_R1_001.fastq.gz",
r2=f"{config['fastq_dir']}/{{sample}}_R2_001.fastq.gz",
output:
f"{OUTPUT_DIR}/03_init_alignment/{{sample}}/{{sample}}_R1R2_sort.bam"
conda:
'envs/main.yml'
shell:
r"""bwa mem -R '@RG\tID:""" + SEQID + r"""\tSM:""" + '{wildcards.sample}' + r"""\tLB:1'""" + ' {rules.copy_fasta.output} {input.r1} {input.r2} | samtools sort -o {output} -'
rule samtools_index_one:
input:
rules.align_reads.output
output:
f"{OUTPUT_DIR}/03_init_alignment/{{sample}}/{{sample}}_R1R2_sort.bam.bai"
conda:
'envs/main.yml'
shell:
"samtools index {input}"
rule samtools_flagstat:
input:
bam=rules.align_reads.output,
idx=rules.samtools_index_one.output
output:
f"{OUTPUT_DIR}/03_init_alignment/{{sample}}/{{sample}}_R1R2_sort_flagstat.txt"
conda:
'envs/main.yml'
shell:
"samtools flagstat {input.bam} > {output}"
rule picard_mark_dupes:
input:
bam=rules.align_reads.output,
idx=rules.samtools_index_one.output,
dct=rules.create_ref_dict.output
output:
bam=f"{OUTPUT_DIR}/04_picard/{{sample}}/{{sample}}_comb_R1R2.MD.bam",
metrics=f"{OUTPUT_DIR}/04_picard/{{sample}}/{{sample}}_comb_R1R2.sort_dup_metrix"
conda:
'envs/main.yml'
shell:
"picard MarkDuplicates --INPUT {input.bam} --OUTPUT {output.bam} --METRICS_FILE {output.metrics} --REMOVE_DUPLICATES true --VALIDATION_STRINGENCY LENIENT"
rule picard_read_groups:
input:
rules.picard_mark_dupes.output.bam
output:
f"{OUTPUT_DIR}/04_picard/{{sample}}/{{sample}}_comb_R1R2.RG.MD.sort.bam",
conda:
'envs/main.yml'
shell:
f"picard AddOrReplaceReadGroups --INPUT {{input}} --OUTPUT /dev/stdout --RGID {SEQID} --RGLB 1 --RGPU 1 --RGPL illumina --RGSM {{wildcards.sample}} --VALIDATION_STRINGENCY LENIENT | samtools sort -o {{output}} -"
rule samtools_index_two:
input:
rules.picard_read_groups.output
output:
f"{OUTPUT_DIR}/04_picard/{{sample}}/{{sample}}_comb_R1R2.RG.MD.sort.bam.bai",
conda:
'envs/main.yml'
shell:
"samtools index {input}"
rule gatk_realign_targets:
input:
fa=rules.copy_fasta.output,
bam=rules.picard_read_groups.output,
idx=rules.samtools_index_two.output,
gatk=rules.gatk_register.output,
faidx=rules.index_fasta.output
output:
f'{OUTPUT_DIR}/05_gatk/{{sample}}/{{sample}}_comb_R1R2.bam.intervals'
conda:
'envs/main.yml'
shell:
"GenomeAnalysisTK -T RealignerTargetCreator -R {input.fa} -I {input.bam} -o {output}"
rule gatk_realign_indels:
input:
fa=rules.copy_fasta.output,
intervals=rules.gatk_realign_targets.output,
bam=rules.picard_read_groups.output,
idx=rules.samtools_index_two.output
output:
bam=f'{OUTPUT_DIR}/05_gatk/{{sample}}/{{sample}}_comb_R1R2.RG.MD.realign.bam',
bai=f'{OUTPUT_DIR}/05_gatk/{{sample}}/{{sample}}_comb_R1R2.RG.MD.realign.bai'
conda:
'envs/main.yml'
shell:
"GenomeAnalysisTK -T IndelRealigner -R {input.fa} -I {input.bam} -targetIntervals {input.intervals} -o {output.bam}"
rule samtools_sort_three:
input:
rules.gatk_realign_indels.output.bam
output:
f'{OUTPUT_DIR}/05_gatk/{{sample}}/{{sample}}_comb_R1R2.RG.MD.realign.sort.bam',
conda:
'envs/main.yml'
shell:
"samtools sort {input} -o {output}"
rule samtools_index_three:
input:
rules.samtools_sort_three.output
output:
f'{OUTPUT_DIR}/05_gatk/{{sample}}/{{sample}}_comb_R1R2.RG.MD.realign.sort.bam.bai',
conda:
'envs/main.yml'
shell:
"samtools index {input}"
rule bcftools_pileup:
input:
bam=rules.samtools_sort_three.output,
idx=rules.samtools_index_three.output
output:
f'{OUTPUT_DIR}/06_variant_calling/{{sample}}/{{sample}}_samtools_AB.vcf',
conda:
'envs/main.yml'
shell:
"bcftools mpileup --ignore-RG -Ou -ABf {rules.copy_fasta.output} {input.bam} | bcftools call -vmO v -o {output}"
rule lofreq:
input:
bam=rules.samtools_sort_three.output,
idx=rules.samtools_index_three.output,
ancidx=rules.index_ancestor_bam.output,
output:
normal=f'{OUTPUT_DIR}/06_variant_calling/{{sample}}/{{sample}}_lofreq_normal_relaxed.vcf.gz',
tumor=f'{OUTPUT_DIR}/06_variant_calling/{{sample}}/{{sample}}_lofreq_tumor_relaxed.vcf.gz',
somatic=f'{OUTPUT_DIR}/06_variant_calling/{{sample}}/{{sample}}_lofreq_somatic_final.vcf.gz',
conda:
'envs/main.yml'
shell:
f"lofreq somatic -n {{rules.copy_ancestor_bam.output}} -t {{input.bam}} -f {{rules.copy_fasta.output}} -o {{output.somatic}} && "
f"bgzip {{output.normal}} && bgzip {{output.tumor}} && bgzip {{output.somatic}}"
rule unzip_lofreq:
input:
normal=rules.lofreq.output.normal,
tumor=rules.lofreq.output.tumor,
somatic=rules.lofreq.output.somatic,
output:
normal=rules.lofreq.output.normal.replace('.vcf.gz', '.vcf'),
tumor=rules.lofreq.output.tumor.replace('.vcf.gz', '.vcf'),
somatic=rules.lofreq.output.somatic.replace('.vcf.gz', '.vcf'),
conda:
'envs/main.yml'
shell:
"bgzip -d {input.normal} && bgzip -d {input.tumor} && bgzip -d {input.somatic}"
rule anc_filter_samtools:
input:
sample=rules.bcftools_pileup.output,
anc=rules.unzip_lofreq_anc.output.normal
output:
f'{OUTPUT_DIR}/07_filtered/{{sample}}/{{sample}}_samtools_AB_AncFiltered.vcf',
conda:
'envs/main.yml'
shell:
f"bedtools intersect -v -header -a {{input.sample}} -b {{input.anc}} > {{output}}"
Tried fixing wildcards in previous rules.Same error. Tried altering config dict. Same error. Tried altering previous rules. Same error. Tried altering config file. Same error. Tried looking at documentation. Don't understand I'm doing wrong. Thought it was lofreq rule that was the problem. Same error. I am stuck as to how to resolve the problem, and confused as to how to deal with the wildcards. 502 refers to the last rule.
To clarify, what I am doing:
I am trying to modify a variant calling pipeline so that it takes in the evolved strains and ancestor strains. In the config file, I have specified where the ancestor files are located and where the evolved strains are located. I left out the steps where I got the reference genome.
The reference call is here:
Located before align_reads_anc and after the list samples rule.
Did you post this question on bioinformatics Stack Exchange a while ago?
I posted about a different problem about the pipeline. Solved it. This is a different problem.
Did you delete that post? Also, is there a reason you switched forums?
I deleted because I figured it out. Biostars seems to have more people.
That's not how forums work. When you create a post with a problem, you've added knowledge to the community that the problem you faced exists. Someone else
mightwill run into the same problem some day and seek help online. At that point in time, your post would show them that someone else faced and solved the problem before them. The right thing to do would have been to add an answer with the solution that you figured out and accept that answer so your post becomes part of the knowledgebase. Please take that approach in the future. Deleting questions that were solved would result in no answered questions existing ever.Sorry. Thank you for letting me know.