Part of a pipeline I am running results in output files, but the files are not recognized by Snakemake.
Code:
import os
import json
from datetime import datetime
from glob import iglob, glob
from snakemake.io import glob_wildcards
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Define Constants ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# discover input files using path from run config
SAMPLES = list(set(glob_wildcards(f"{config['fastq_dir']}/{{sample}}_R1_001.fastq.gz").sample))
# read output dir path from run config
OUTPUT_DIR = f"{config['output_dir']}"
# Project name and date for bam header
SEQID='yevo_pipeline_align'
#config['anc_r'] = list(set(glob_wildcards(f"{config['anc_dir']}/{{anc_r}}_R1_001.fastq.gz").anc_r))
#Error may have been in index ancestor_bam rule
anc_r = "/net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineanc"
anc_t = "anc"
ref_dir = "/net/dunham/vol1/home/dennig2/yevo_pipeline/data/genome/sacCer3.fasta"
#TO DO change anc variables to anc_t
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Begin Pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# https://snakemake.readthedocs.io/en/v7.14.0/tutorial/basics.html#step-7-adding-a-target-rule
rule all:
input:
f'{OUTPUT_DIR}/DONE.txt'
# ~~~~~~~~~~~~~~~~~~~~~~~~~~ Set Up Reference Files ~~~~~~~~~~~~~~~~~~~~~~~~~ #
#
# export the current run configuration in JSON format
#
rule export_run_config:
output:
path=f"{OUTPUT_DIR}/00_logs/00_run_config.json"
run:
with open(output.path, 'w') as outfile:
json.dump(dict(config), outfile, indent=4)
#
# make a list of discovered samples
#
rule list_samples:
output:
f"{OUTPUT_DIR}/00_logs/00_sample_list.txt"
shell:
"echo -e '{}' > {{output}}".format('\n'.join(SAMPLES))
#
# copy the supplied reference genome fasta to the pipeline output directory for reference
#
rule copy_fasta:
input:
ref=f"{ref_dir}"
output:
f"{OUTPUT_DIR}/01_ref_files/reference"
shell:
"cp {input.ref} {output}"
rule index_fasta:
input:
rules.copy_fasta.output
output:
f"{rules.copy_fasta.output}.fai"
conda:
'envs/main.yml'
shell:
"samtools faidx {input}"
rule create_ref_dict:
input:
rules.copy_fasta.output
output:
f"{rules.copy_fasta.output}".rstrip('fasta') + 'dict'
conda:
'envs/main.yml'
shell:
"picard CreateSequenceDictionary -R {input}"
#
# create a BWA index from the copied fasta reference genome
#
rule create_bwa_index:
input:
rules.copy_fasta.output
output:
f"{rules.copy_fasta.output}.amb",
f"{rules.copy_fasta.output}.ann",
f"{rules.copy_fasta.output}.bwt",
f"{rules.copy_fasta.output}.pac",
f"{rules.copy_fasta.output}.sa",
conda:
'envs/main.yml'
shell:
"bwa index {input}"
#
# Get the ancestor BAM
#
rule align_reads_anc:
input:
rules.create_bwa_index.output,
R1= f"{anc_r}/YMD4612_pink_S1_R1_001.fastq.gz",
R2= f"{anc_r}/YMD4612_pink_S1_R2_001.fastq.gz",
output:
bam=f"{OUTPUT_DIR}/03_init_alignment/{anc_t}/{anc_t}_R1R2_sort.bam",
conda:
'envs/main.yml'
shell:
r"""bwa mem -R '@RG\tID:""" + SEQID + r"""\tSM:""" + '{anc_t}' + r"""\tLB:1'""" + ' {rules.copy_fasta.output} {input.R1} {input.R2} | samtools sort -o {output.bam} - && samtools index {output.bam}'
Error:
rule list_samples:
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_sample_list.txt
jobid: 3
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_sample_list.txt
resources: tmpdir=/tmp/293419245.1.sage-long.q
[Tue Mar 28 21:47:11 2023]
rule copy_fasta:
input: /net/dunham/vol1/home/dennig2/yevo_pipeline/data/genome/sacCer3.fasta
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/01_ref_files/reference
jobid: 7
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/01_ref_files/reference
resources: tmpdir=/tmp/293419245.1.sage-long.q
[Tue Mar 28 21:47:11 2023]
rule gatk_register:
input: workflow/envs/src/GenomeAnalysisTK-3.7-0-gcfedb67.tar.bz2
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/05_gatk/gatk_3.7_registered.txt
jobid: 24
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/05_gatk/gatk_3.7_registered.txt
resources: tmpdir=/tmp/293419245.1.sage-long.q
Activating conda environment: ../.snakemake/conda/82f61b1b43a83fa5851b5687321ca2bf_
[Tue Mar 28 21:47:11 2023]
rule export_run_config:
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_run_config.json
jobid: 2
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_run_config.json
resources: tmpdir=/tmp/293419245.1.sage-long.q
However, checking the output directory does see output from the export_run_config, list_samples, copy_fasta, and create_bwa_index rules. Why is Snakemake not recognizing the output files here?
This is the reason why the rule is executed.
Ok, the rule is executed, but then it says the output files are missing.
I simply don't understand why there is a disconnect there, even after changing file permissions. I can go back and check if the file paths are correct again, but is there anything else that could be done?
Sorry but I'm not able to properly understand your scenario. Did your pipeline complete the dry run?
Yes, it completed the dry run. What other information do you need?