Snakemake 'Unexpected keyword ref in rule definition'
2
1
Entering edit mode
7 months ago

Hello everyone, I am trying to get more familiar with Snakemake. I have created this Snakefile:

rule alignment_haplotypecaller:
output:
vcf = 'variants/{sample}.g.vcf.gz'
input:
bam = 'alignment/{sample}.bam'
reference:
fasta = 'reference.fasta'
shell:
r"""
samtools index {input.bam}
~/gatk-4.2.0.0/gatk HaplotypeCaller \
--reference {reference.fasta} \
--input  {input.bam} \
--output {output.vcf} \
--emit-ref-confidence BP_RESOLUTION
"""


An I run Snakemake with the following command:

snakemake -j1 -F -p alignment_haplotypecaller


I get this error message:

SyntaxError in line 7 of /mnt/shared/usr/test_snakemake/Snakefile: Unexpected keyword ref in rule definition (Snakefile, line 7)

Would anyone know why the key line 7 ("fasta") is problematic?

snakemake python • 1.7k views
1
Entering edit mode

Check if reference: can be used in rules.

1
Entering edit mode
7 months ago
Shred ▴ 870

Before executing the snakemake is useful to run in dry mode (using -n ) to see if there's any problem in your syntax or input/output. You're using an expected keyword for the rule, reference . Replace it with params.

0
Entering edit mode
7 months ago

Thanks you, the keyword was indeed the issue. I though they could take any name, but it is not the case. I didn't know about the dry mode. Thanks.

Now there is a wildcard error, I will try to manage them differently.

WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).

1
Entering edit mode

Snakemake wildcards are different from Bash ones. You need to inform Snakemake how to expand the wildcard in order to be consistent in input and output. In your case, for example, you could declare :

import glob
SAMPLES,= glob_wildcards('variants/{sample}.g.vcf.gz') # retrieve files via glob, instruct wildcards


Then, as suggested, declare a rule all, in which you declare how Snakemake rule's input will be expanded. Note that this is convenient because allows to use the same wildcards across multiple rule's input, just specifying how

rule all:
input:
expand('variants/{sample}.g.vcf.gz', sample=SAMPLES) # --> referred to alignment_haplotypecaller
expand('alignment/{sample}.bam', sample=SAMPLES) # --> other rule example, which uses haplotypecaller input with same wildcard


Read more here (alternative to Snakemake's documentation )

0
Entering edit mode

Thanks Shred! I have followed other documentation, but the link you provided will certainly help.

I have tried the following script, and got this error:

WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).

Would anyone know what is the cause of the error here?

# Input conditions and replicates to process

# Dictionnaries:
SAMPLES = glob_wildcards('alignment/Sample_{sample}.baits_conch.bam ')

# Expand wildcards:
rule all:
input:
expand('alignment/Sample_{sample}.baits_conch.bam', sample=SAMPLES)

# First rule:
rule alignment:
output:
snp = 'variants/Sample_{sample}.baits_conch.g.vcf.gz'
input:
ali = 'alignment/Sample_{sample}.baits_conch.bam'
params:
ref = 'path/to/reference.fasta'
shell:
r"""samtools index {input.ali}
~/gatk-4.2.0.0/gatk HaplotypeCaller \
--reference {params.ref} \
--input  {input.ali} \
--output {output.snp} \
--emit-ref-confidence BP_RESOLUTION
"""

1
Entering edit mode
SAMPLES, =! SAMPLES


Comma returns a tuple. https://stackoverflow.com/a/16037517/9300441

Be careful with indentation, you're indenting every rules, which is unwanted yet prone to errors.

0
Entering edit mode

Thank Shred, the indent issue is a part of my not-well configured VIM. I will be careful to that. Do we need the tuple command you mentioned? There is one output for one input in this case.

0
Entering edit mode

why is your input in rule all is bam? Shouldn't be vcf.gz. (from output of alignment rule)? or is it a typo?

0
Entering edit mode

Thanks for the question.

It is not a typo. BAM files are the input in HaplotypeCaller, the shell command, which will output VCF files.

Do you mean that I should expand the output (VCF) rather than expanding the input (BAM)?

1
Entering edit mode

Yes. That is how snakemake builds dag. Here is an example snakemake workflow for variant calling using gatk. Look at the final snakemake file: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/workflow/Snakefile and rule for variant calling: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/workflow/rules/calling.smk

0
Entering edit mode

Thank you cpad0112, I have made a fork of this repository. I feel very confused about the output-focused approach of Snakemake, but working a bit on the gatk-variant-calling pipeline will be indeed helpful.

1
Entering edit mode

Each frame work comes with it's own quirks. Good luck with Snakemake. If you want to try another frame work, I suggest nextflow.