Entering edit mode
2.0 years ago
DdogBoss
▴
20
Part of a pipeline I am running results in output files, but the files are not recognized by Snakemake.
Code:
import os
import json
from datetime import datetime
from glob import iglob, glob
from snakemake.io import glob_wildcards
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Define Constants ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# discover input files using path from run config
SAMPLES = list(set(glob_wildcards(f"{config['fastq_dir']}/{{sample}}_R1_001.fastq.gz").sample))
# read output dir path from run config
OUTPUT_DIR = f"{config['output_dir']}"
# Project name and date for bam header
SEQID='yevo_pipeline_align'
#config['anc_r'] = list(set(glob_wildcards(f"{config['anc_dir']}/{{anc_r}}_R1_001.fastq.gz").anc_r))
#Error may have been in index ancestor_bam rule
anc_r = "/net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineanc"
anc_t = "anc"
ref_dir = "/net/dunham/vol1/home/dennig2/yevo_pipeline/data/genome/sacCer3.fasta"
#TO DO change anc variables to anc_t
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Begin Pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# https://snakemake.readthedocs.io/en/v7.14.0/tutorial/basics.html#step-7-adding-a-target-rule
rule all:
input:
f'{OUTPUT_DIR}/DONE.txt'
# ~~~~~~~~~~~~~~~~~~~~~~~~~~ Set Up Reference Files ~~~~~~~~~~~~~~~~~~~~~~~~~ #
#
# export the current run configuration in JSON format
#
rule export_run_config:
output:
path=f"{OUTPUT_DIR}/00_logs/00_run_config.json"
run:
with open(output.path, 'w') as outfile:
json.dump(dict(config), outfile, indent=4)
#
# make a list of discovered samples
#
rule list_samples:
output:
f"{OUTPUT_DIR}/00_logs/00_sample_list.txt"
shell:
"echo -e '{}' > {{output}}".format('\n'.join(SAMPLES))
#
# copy the supplied reference genome fasta to the pipeline output directory for reference
#
rule copy_fasta:
input:
ref=f"{ref_dir}"
output:
f"{OUTPUT_DIR}/01_ref_files/reference"
shell:
"cp {input.ref} {output}"
rule index_fasta:
input:
rules.copy_fasta.output
output:
f"{rules.copy_fasta.output}.fai"
conda:
'envs/main.yml'
shell:
"samtools faidx {input}"
rule create_ref_dict:
input:
rules.copy_fasta.output
output:
f"{rules.copy_fasta.output}".rstrip('fasta') + 'dict'
conda:
'envs/main.yml'
shell:
"picard CreateSequenceDictionary -R {input}"
#
# create a BWA index from the copied fasta reference genome
#
rule create_bwa_index:
input:
rules.copy_fasta.output
output:
f"{rules.copy_fasta.output}.amb",
f"{rules.copy_fasta.output}.ann",
f"{rules.copy_fasta.output}.bwt",
f"{rules.copy_fasta.output}.pac",
f"{rules.copy_fasta.output}.sa",
conda:
'envs/main.yml'
shell:
"bwa index {input}"
#
# Get the ancestor BAM
#
rule align_reads_anc:
input:
rules.create_bwa_index.output,
R1= f"{anc_r}/YMD4612_pink_S1_R1_001.fastq.gz",
R2= f"{anc_r}/YMD4612_pink_S1_R2_001.fastq.gz",
output:
bam=f"{OUTPUT_DIR}/03_init_alignment/{anc_t}/{anc_t}_R1R2_sort.bam",
conda:
'envs/main.yml'
shell:
r"""bwa mem -R '@RG\tID:""" + SEQID + r"""\tSM:""" + '{anc_t}' + r"""\tLB:1'""" + ' {rules.copy_fasta.output} {input.R1} {input.R2} | samtools sort -o {output.bam} - && samtools index {output.bam}'
Error:
rule list_samples:
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_sample_list.txt
jobid: 3
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_sample_list.txt
resources: tmpdir=/tmp/293419245.1.sage-long.q
[Tue Mar 28 21:47:11 2023]
rule copy_fasta:
input: /net/dunham/vol1/home/dennig2/yevo_pipeline/data/genome/sacCer3.fasta
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/01_ref_files/reference
jobid: 7
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/01_ref_files/reference
resources: tmpdir=/tmp/293419245.1.sage-long.q
[Tue Mar 28 21:47:11 2023]
rule gatk_register:
input: workflow/envs/src/GenomeAnalysisTK-3.7-0-gcfedb67.tar.bz2
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/05_gatk/gatk_3.7_registered.txt
jobid: 24
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/05_gatk/gatk_3.7_registered.txt
resources: tmpdir=/tmp/293419245.1.sage-long.q
Activating conda environment: ../.snakemake/conda/82f61b1b43a83fa5851b5687321ca2bf_
[Tue Mar 28 21:47:11 2023]
rule export_run_config:
output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_run_config.json
jobid: 2
reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_run_config.json
resources: tmpdir=/tmp/293419245.1.sage-long.q
However, checking the output directory does see output from the export_run_config, list_samples, copy_fasta, and create_bwa_index rules. Why is Snakemake not recognizing the output files here?