Snakemake wildcards in the path input/output
1
0
Entering edit mode
8 weeks ago
wanaga3166 ▴ 10

Hi everyone,

I wrote a snakefile to check the quality of several fastq files and put the report in a results directory. I thought to use wildcards as explained in the snakemake readthedocs. But it's return an invalid syntax at the first line in the code below. Where I did a mistake? Maybe I should use the aggregation function.

TUMORS, SAMPLES, = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz)

rule all:
input:
["../Data/{tumor}/{sample}.fastq.gz".format(tumor=tumor) for tumor in TUMORS]

rule fastqc_before_trim:
input:
"../Data/{tumor}/{sample}.fastq.gz"
output:
"../Results/QC/Before_trimming/{tumor}/{sample}_fastqc.html"
4
shell:
"fastqc -t {threads} {input} -o {output}"


My working directory is organized as below.

------ Script
|           |
|           |------Script_01.smk
|           |------Script_02.smk
|
|------Data
|           |
|           |------Tumor_01
|           |           |
|           |           |------Lane01.fastq.gz
|           |           |------Lane02.fastq.gz
|           |------Tumor_02
|                       |
|                       |------Lane01.fastq.gz
|                       |------Lane02.fastq.gz
|
|------Results
|           |
|           |--------QC
|           |        |
|           |        |------Before_Trimming
|           |        |             |
|           |        |             |------Tumor_01
|           |        |             |             |
|           |        |             |             |------Tumor_01_Lane01_fastqc.html
|           |        |             |             |------Tumor_01_Lane02_fastqc.html
|           |        |             |
|           |        |             |------Tumor_02
|           |        |                           |
|           |        |                           |------Tumor_02_Lane01_fastqc.html
|           |        |                           |------Tumor_02_Lane02_fastqc.html
|           |        |
|           |        |------After_Trimming
|           |                      |
|           |                      |------Tumor_01
|           |                      |             |
|           |                      |             |------Tumor_01_Lane01_cleaned_fastqc.html
|           |                      |             |------Tumor_01_Lane02_cleaned_fastqc.html
|           |                      |
|           |                      |------Tumor_02
|           |                                    |
|           |                                    |------Tumor_02_Lane01_cleaned_fastqc.html
|           |                                    |------Tumor_02_Lane02_cleaned_fastqc.html
|           |
|           |------Mapping
|
|


Thank you for your help.

Snakemake • 350 views
0
Entering edit mode

Rule all input is not matching with output from fastqc if rule fastqc_before_trim: is only rule. If not, please post entire snakemake file. Even if it is correct, you have not expanded samples in rule all input.

0
Entering edit mode

Below you will find the entire snakemake file. When I execute my snakefile, I have the same message invalid syntax on this line TUMORS, SAMPLES, = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz).

TUMORS, SAMPLES, = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz)

rule all:
input:
expand("../Results/QC/Before_Trimming/{tumor}/{sample}_fastqc.html", tumor = TUMORS, sample = SAMPLES)
expand("../Results/QC/After_Trimming/{tumor}/{sample}_cleaned_fastqc.html", tumor = TUMORS, sample = SAMPLES)

rule fastqc_before_trim:
input:
"../Data/{tumor}/{sample}.fastq.gz"
output:
"../Results/QC/Before_Trimming/{tumor}/{sample}_fastqc.html"
4
shell:
"fastqc -t {threads} {input} -o {output}"

rule trim:
input:
"../Data/{tumor}/{sample}.fastq.gz"
params:
"../Data/{tumor}/"
output:
"../Data/{tumor}/{sample}_cleaned.fastq"
conda:
"trim.yaml"
shell:
"cutadapt -a AAGCAGTGGTATCAACGCAGAGTACATGGGGTCAGATGTGTATAAGAGAC -o {output} {input}"

rule fastqc_after_trim:
input:
"../Data/{tumor}/{sample}_cleaned.fastq"
output:
"../Results/QC/After_Trimming/{tumor}/{sample}_cleaned_fastqc.html"
4
shell:
"fastqc -t {threads} {input} -o {output}"

1
Entering edit mode
8 weeks ago

The way Snakemake works is that you write a series of recipes and then you ask it to cook you dinner. Your explicit rule all needs to list files that you want produced by your implicit rule fastqc_before_trim, not the files it needs as input.

The syntax error is likely just a comma issue.

0
Entering edit mode

Thank Jeremy.

I modified my snakefile. However, I have the same problem (syntax error) with this line: (TUMORS, SAMPLES) = glob_wildcards("../Data/{tumor}/{sample}.fastq.gz). I tested : TUMORS, SAMPLES = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz") and it doesn't work too.

(TUMORS, SAMPLES) = glob_wildcards(../Data/{tumor}/{sample}.fastq.gz)

rule all:
input:
expand("../Results/QC/Before_Trimming/{tumor}/{sample}_fastqc.html", tumor = TUMORS, sample = SAMPLES),
expand("../Results/QC/After_Trimming/{tumor}/{sample}_cleaned_fastqc.html", tumor = TUMORS, sample = SAMPLES)

1
Entering edit mode

(TUMORS, SAMPLES) = glob_wildcards("../Data/{tumor}/{sample}.fastq.gz")

0
Entering edit mode

Thank Jeremy. I corrected the file. Now I have an another problem:

Snakemake return this error message :

Missing input files for rule fastqc_before_trim:
../Data/Gon_M1/PB4_S13_L001_R2_001.fastq.gz


I was surprisingly by this message because the fastq.gz file (PB4_S13_L001_R2_001.fastq.gz) is not stored in Gon_M1 folder but in Gon_M3 folder. How to avoid this problem ?

0
Entering edit mode

Expand is producing every possible combination of TUMOR/SAMPLE. Maybe you should be building your list of desired files in a more controlled fashion. I don't use glob_wildcards because I don't like to have my filesystem determine what gets analyzed.