Deleted:Snakemake doesn't recognize output files even though they are created
1
0
Entering edit mode
2.0 years ago
DdogBoss ▴ 20

Part of a pipeline I am running results in output files, but the files are not recognized by Snakemake.

Code:

import os
import json
from datetime import datetime
from glob import iglob, glob
from snakemake.io import glob_wildcards

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Define Constants ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #

# discover input files using path from run config
SAMPLES = list(set(glob_wildcards(f"{config['fastq_dir']}/{{sample}}_R1_001.fastq.gz").sample))

# read output dir path from run config
OUTPUT_DIR = f"{config['output_dir']}"

# Project name and date for bam header
SEQID='yevo_pipeline_align'

#config['anc_r'] = list(set(glob_wildcards(f"{config['anc_dir']}/{{anc_r}}_R1_001.fastq.gz").anc_r))
#Error may have been in index ancestor_bam rule

anc_r = "/net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineanc"
anc_t = "anc"
ref_dir = "/net/dunham/vol1/home/dennig2/yevo_pipeline/data/genome/sacCer3.fasta"

#TO DO change anc variables to anc_t
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Begin Pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #

# https://snakemake.readthedocs.io/en/v7.14.0/tutorial/basics.html#step-7-adding-a-target-rule 
rule all:
    input:
        f'{OUTPUT_DIR}/DONE.txt'


# ~~~~~~~~~~~~~~~~~~~~~~~~~~ Set Up Reference Files ~~~~~~~~~~~~~~~~~~~~~~~~~ #

#
# export the current run configuration in JSON format
#
rule export_run_config:
    output:
        path=f"{OUTPUT_DIR}/00_logs/00_run_config.json"
    run:
        with open(output.path, 'w') as outfile:
            json.dump(dict(config), outfile, indent=4)


#
# make a list of discovered samples
#
rule list_samples:
    output:
        f"{OUTPUT_DIR}/00_logs/00_sample_list.txt"
    shell:
        "echo -e '{}' > {{output}}".format('\n'.join(SAMPLES))


#
# copy the supplied reference genome fasta to the pipeline output directory for reference
#
rule copy_fasta:
    input:
        ref=f"{ref_dir}"
    output:
        f"{OUTPUT_DIR}/01_ref_files/reference"
    shell:
        "cp {input.ref} {output}"


rule index_fasta:
    input:
        rules.copy_fasta.output
    output:
        f"{rules.copy_fasta.output}.fai"
    conda:
        'envs/main.yml'
    shell:
        "samtools faidx {input}"


rule create_ref_dict:
    input:
        rules.copy_fasta.output
    output:
        f"{rules.copy_fasta.output}".rstrip('fasta') + 'dict'
    conda:
        'envs/main.yml'
    shell:
        "picard CreateSequenceDictionary -R {input}"

#
# create a BWA index from the copied fasta reference genome
#
rule create_bwa_index:
    input:
        rules.copy_fasta.output
    output:
        f"{rules.copy_fasta.output}.amb",
        f"{rules.copy_fasta.output}.ann",
        f"{rules.copy_fasta.output}.bwt",
        f"{rules.copy_fasta.output}.pac",
        f"{rules.copy_fasta.output}.sa",
    conda:
        'envs/main.yml'
    shell:
        "bwa index {input}"        

#
# Get the ancestor BAM
#



rule align_reads_anc:
    input:
        rules.create_bwa_index.output,
        R1= f"{anc_r}/YMD4612_pink_S1_R1_001.fastq.gz",
        R2= f"{anc_r}/YMD4612_pink_S1_R2_001.fastq.gz",
    output:
        bam=f"{OUTPUT_DIR}/03_init_alignment/{anc_t}/{anc_t}_R1R2_sort.bam",

    conda:
        'envs/main.yml'
    shell:
        r"""bwa mem -R '@RG\tID:""" + SEQID + r"""\tSM:""" + '{anc_t}' + r"""\tLB:1'""" + ' {rules.copy_fasta.output} {input.R1} {input.R2} | samtools sort -o {output.bam} - && samtools index {output.bam}'

Error:

rule list_samples:
    output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_sample_list.txt
    jobid: 3
    reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_sample_list.txt
    resources: tmpdir=/tmp/293419245.1.sage-long.q


[Tue Mar 28 21:47:11 2023]
rule copy_fasta:
    input: /net/dunham/vol1/home/dennig2/yevo_pipeline/data/genome/sacCer3.fasta
    output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/01_ref_files/reference
    jobid: 7
    reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/01_ref_files/reference
    resources: tmpdir=/tmp/293419245.1.sage-long.q


[Tue Mar 28 21:47:11 2023]
rule gatk_register:
    input: workflow/envs/src/GenomeAnalysisTK-3.7-0-gcfedb67.tar.bz2
    output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/05_gatk/gatk_3.7_registered.txt
    jobid: 24
    reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/05_gatk/gatk_3.7_registered.txt
    resources: tmpdir=/tmp/293419245.1.sage-long.q

Activating conda environment: ../.snakemake/conda/82f61b1b43a83fa5851b5687321ca2bf_

[Tue Mar 28 21:47:11 2023]
rule export_run_config:
    output: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_run_config.json
    jobid: 2
    reason: Missing output files: /net/dunham/vol1/home/dennig2/yevo_pipeline/pipelineoutput/00_logs/00_run_config.json
    resources: tmpdir=/tmp/293419245.1.sage-long.q

However, checking the output directory does see output from the export_run_config, list_samples, copy_fasta, and create_bwa_index rules. Why is Snakemake not recognizing the output files here?

python Snakemake • 2.9k views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6