snakemake and Star aligner code issue
0
1
Entering edit mode
22 months ago
lawrence.hsu ▴ 10

I'm running into a nameerror where says 'sample' is unknown in this context. I'm thinking it has to do with the --outFileNamePrefix in the shell where it won't allow {sample}. I've tested it with coding a single sample to be processed and it works fine. I'm not sure how to proceed.

configfile: "config.yaml"
SAMPLES= ['A','B','C']

rule all:
input:
expand("01-qc/{sample}_fastqc.html", sample=config['samples']),
expand('01-qc/{sample}_fastqc.zip', sample=config['samples']),
expand('02-trimmed/{sample}.fastq_trimming_report.txt', sample=config['samples']),
expand('02-trimmed/{sample}_trimmed.fq', sample=config['samples']),
expand("03-qc/{sample}_trimmed_fastqc.html", sample=config['samples']),
expand('03-qc/{sample}_trimmed_fastqc.zip', sample=config['samples']),
expand('04-aligned-2pass/{sample}_ID/Aligned.out.sam', sample=SAMPLES),
expand('04-aligned-2pass/{sample}_ID/Log.final.out', sample=SAMPLES)

rule fastQC01:
input:
sample=lambda wildcards: config['samples'][wildcards.sample]
output:
html="01-qc/fastqc/{sample}_fastqc.html",
zip="01-qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params: ""
log:
"logs/fastqc/{sample}.log"
shell:
"fastqc {input} -o 01-qc/fastqc/"

rule trimgalore02:
input:
sample=lambda wildcards: config['samples'][wildcards.sample]
output:
report="02-trimmed/{sample}.fastq_trimming_report.txt",
fq="02-trimmed/{sample}_trimmed.fq"
log:
"logs/trimgalore/{sample}.log"
shell:
"trim_galore --length 20 -q 20 {input} -o 02-trimmed/"

rule fastQC03:
input:
"02-trimmed/{sample}_trimmed.fq"
output:
html="03-qc/{sample}_trimmed_fastqc.html",
zip="03-qc/{sample}_trimmed_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params: ""
log:
"logs/03-qc/{sample}.log"
shell:
"fastqc {input} -o 03-qc/"

rule STAR04:
input:
expand("02-trimmed/{sample}_trimmed.fq", sample=SAMPLES)
output:
sam=expand("04-aligned-2pass/{sample}_ID/Aligned.out.sam", sample=SAMPLES),
sum=expand("04-aligned-2pass/{sample}_ID/Log.final.out", sample=SAMPLES)
log:
expand("logs/star/{sample}.log", sample=SAMPLES)
shell:
"/data/p_magnuson_lab/bin/STAR-2.6.0c/bin/Linux_x86_64/STAR "
"--genomeDir /data/p_magnuson_lab/reference/mouse/STAR_index "
"--twopassMode Basic "
"--alignEndsType EndToEnd "
"--alignIntronMax 1 "
"--outFileNamePrefix 04-aligned-2pass/{sample}_ID/"

snakemake • 1.4k views
2
Entering edit mode

You don't want expand in a wildcard rule - leave that for your targets. A wildcard rule is a recipe, not a manifest.

0
Entering edit mode

Have you tried running by changing {sample} to {wildcards.sample} in your shell commands?

0
Entering edit mode

Yes I have tried that but it comes back with the error: 'Wildcards' object has no attribute 'sample'

2
Entering edit mode

You get the error because STAR04 doesn't have any wildcard. {sample} is already resolved by expand, hence wildcards.sample is not a valid attribute.

Perhaps this post will help. Use a workflow management tool to manage your computational pipelines . Your use of expand in both input and output in STAR04 doesn't make sense.

0
Entering edit mode

That was my issue. Thanks!

0
Entering edit mode

May be it should be {wildcards.samples} since you have SAMPLES= ['A','B','C'] at the beginning of your config file and you are trying to refer to that.