Hello all,
Im rather new to Snakemake using Python, im trying to make a pipeline but the Rule all from the main script seems to have the wrong order and i cant seem to change it no matter what i do. Can anybody show me what im doing wrong.
Here is my first script:
configfile: "/home/PycharmProjects/Pipeline/config.yaml"
rule first:
input:
expand("Trimmed_reads/{srr}_trimmed.fastq",srr=config['srr'])
rule prefetch:
output:
"prefetch_files/sra/{srr}.sra"
params:
"{srr} --max-size 250GB -O sra_files"
log:
"prefetch_files/sra/{srr}.log"
message:
"Downloading files"
shell:
"""
/Tools/sra_toolkit/sratoolkit.3.0.0-ubuntu64/bin/prefetch {params} > {log} 2>&1 && touch
{output}
"""
And within the same file is:
rule fastqdump:
input:
"prefetch_files/sra/{srr}.sra"
output:
touch("prefetch_files/done__{srr}_dump")
params:
args = "-S -O fastq_files/ -t fastq_files/ ",
id_srr = "{srr}"
log:
"prefetch_files/{srr}.log"
shell:
"""
/Tools/sra_toolkit/sratoolkit.3.0.0-ubuntu64/bin/fasterq-dump {params.args} {params.id_srr} > {log} 2>&1
"""
If i run this first manually nothing is wrong and it gets me all my files that i need for the following script (i think. i can be wrong here, but it gets me files)
Then in a second script i try the trimmomatic:
configfile: "/PycharmProjects/Pipeline/config.yaml"
rule now:
input:
expand("Trimmed_reads/{srr}_trimmed.fastq", srr=config['srr'])
rule trimmomatic:
input:
unused = "prefetch_files/done__{srr}_dump",
raw=config['FileDir']+"/{srr}.fastq",
anno=config["trimmomatic"]["adapter"]
output:
touch("Trimmed_reads/{srr}_trimmed.fastq")
threads: config["trimmomatic"]["treads"]
params:
jar=config["trimmomatic"]["jar"],
phred=config["trimmomatic"]["phred"],
minlen=config["trimmomatic"]["minlen"],
trailing=config["trimmomatic"]["trailing"],
leading=config["trimmomatic"]["leading"],
slidwindow=config["trimmomatic"]["slidwindow"]
message: "Started read trimming!"
log:
"logs/trimmomatic/{srr}_trimmed.log"
shell:
"(java -jar {params.jar} SE {params.phred} {input.raw} {output} ILLUMINACLIP:
{input.anno}:2:30:10{params.leading}{params.trailing}{params.slidwindow} {params.minlen}) >
{log} 2>&1"`
And my main.smk is this: configfile: "/PycharmProjects/Pipeline/config.yaml"
include: "download_sample.smk"
include: "trimming.smk"
include: "dagfile.smk"
rule all:
input:
expand("Trimmed_reads/{srr}_trimmed.fastq",srr=config['srr']),
expand("prefetch_files/done__{srr}_dump", srr=config['srr'])
And in case its important my config.yaml:
FileDir: "/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing/fastq_files"
srr:
- SRR5327856
- SRR5327984
- SRR5327985
trimmomatic:
adapter: /PycharmProjects/Pipeline/all_adapters.fa
jar: /Documents/Lisan/Tools/Trimmomatic-0.39/trimmomatic-0.39.jar
phred: -phred33
minlen: 45
trailing: 3
leading: 3
slidwindow: 4:15
treads: 35
I tried adding the output files from the prefetch to the trimmomatic input but this doesnt seem to help. Anytime i run the main it will start with the trimmomatic file and error since the files dont exist.
(base) Workstation:~/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing$
snakemake --snakefile main.smk -c4
Building DAG of jobs...
MissingInputException in rule trimmomatic in line 9 of
/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing/trimming.smk:
Missing input files for rule trimmomatic:
output: Trimmed_reads/SRR5327856_trimmed.fastq
wildcards: srr=SRR5327856
affected files:
/PycharmProjects/Pipeline/Pipeline/workflow/Pre-processing/fastq_files/SRR5327856.fastq
I tried googling my error or my fault once i got stuck but i didnt really find anything which lead me to the possible conclusion that its probably something very simple that im not seeing. I dont have anybody around me who can help me with Python or Snakemake so i hope somebody here can help me. Thanks
The error indicates
Missing input files for rule trimmomatic. So we start the debug from there. In yourtrimmomaticrule, it asks for input files that satisfy"config['FileDir']+"/{srr}.fastq", but none of your rules defines these files as outputs. Even though we know these fastqs are generated byfasterq-dump, snakemake doesn't. So you need to specifically define those files in yourfastqdumprule as outputs. Otherwise, snakemake doesn't know how to build the dag. Hope this helps.