Snakemake 'Unexpected keyword ref in rule definition'
2
1
Entering edit mode
2.2 years ago

Hello everyone, I am trying to get more familiar with Snakemake. I have created this Snakefile:

rule alignment_haplotypecaller:
  output: 
      vcf = 'variants/{sample}.g.vcf.gz'
  input:  
      bam = 'alignment/{sample}.bam'
  reference:
      fasta = 'reference.fasta'
  shell:
    r"""
        samtools index {input.bam}
    ~/gatk-4.2.0.0/gatk HaplotypeCaller \
    --reference {reference.fasta} \
    --input  {input.bam} \
    --output {output.vcf} \
    --emit-ref-confidence BP_RESOLUTION 
    """

An I run Snakemake with the following command:

snakemake -j1 -F -p alignment_haplotypecaller

I get this error message:

SyntaxError in line 7 of /mnt/shared/usr/test_snakemake/Snakefile: Unexpected keyword ref in rule definition (Snakefile, line 7)

Would anyone know why the key line 7 ("fasta") is problematic?

snakemake python • 4.2k views
ADD COMMENT
1
Entering edit mode

Check if reference: can be used in rules.

ADD REPLY
1
Entering edit mode
2.2 years ago
Shred ★ 1.4k

Before executing the snakemake is useful to run in dry mode (using -n ) to see if there's any problem in your syntax or input/output. You're using an expected keyword for the rule, reference . Replace it with params.

ADD COMMENT
0
Entering edit mode
2.2 years ago

Thanks you, the keyword was indeed the issue. I though they could take any name, but it is not the case. I didn't know about the dry mode. Thanks.

Now there is a wildcard error, I will try to manage them differently.

WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).

ADD COMMENT
1
Entering edit mode

Snakemake wildcards are different from Bash ones. You need to inform Snakemake how to expand the wildcard in order to be consistent in input and output. In your case, for example, you could declare :

import glob
SAMPLES,= glob_wildcards('variants/{sample}.g.vcf.gz') # retrieve files via glob, instruct wildcards

Then, as suggested, declare a rule all, in which you declare how Snakemake rule's input will be expanded. Note that this is convenient because allows to use the same wildcards across multiple rule's input, just specifying how

rule all:
    input:
        expand('variants/{sample}.g.vcf.gz', sample=SAMPLES) # --> referred to alignment_haplotypecaller
        expand('alignment/{sample}.bam', sample=SAMPLES) # --> other rule example, which uses haplotypecaller input with same wildcard

Read more here (alternative to Snakemake's documentation )

ADD REPLY
0
Entering edit mode

Thanks Shred! I have followed other documentation, but the link you provided will certainly help.

I have tried the following script, and got this error:

WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).

Would anyone know what is the cause of the error here?

# Input conditions and replicates to process

# Dictionnaries:
SAMPLES = glob_wildcards('alignment/Sample_{sample}.baits_conch.bam ') 

# Expand wildcards:    
    rule all:
      input:
          expand('alignment/Sample_{sample}.baits_conch.bam', sample=SAMPLES) 

    # First rule:
    rule alignment:
      output:
          snp = 'variants/Sample_{sample}.baits_conch.g.vcf.gz'
      input:
          ali = 'alignment/Sample_{sample}.baits_conch.bam'
      params:
          ref = 'path/to/reference.fasta'
      shell:
        r"""samtools index {input.ali}
        ~/gatk-4.2.0.0/gatk HaplotypeCaller \
        --reference {params.ref} \
        --input  {input.ali} \
        --output {output.snp} \
        --emit-ref-confidence BP_RESOLUTION
        """
ADD REPLY
1
Entering edit mode
SAMPLES, =! SAMPLES 

Comma returns a tuple. https://stackoverflow.com/a/16037517/9300441

Be careful with indentation, you're indenting every rules, which is unwanted yet prone to errors.

ADD REPLY
0
Entering edit mode

Thank Shred, the indent issue is a part of my not-well configured VIM. I will be careful to that. Do we need the tuple command you mentioned? There is one output for one input in this case.

ADD REPLY
0
Entering edit mode

why is your input in rule all is bam? Shouldn't be vcf.gz. (from output of alignment rule)? or is it a typo?

ADD REPLY
0
Entering edit mode

Thanks for the question.

It is not a typo. BAM files are the input in HaplotypeCaller, the shell command, which will output VCF files.

Do you mean that I should expand the output (VCF) rather than expanding the input (BAM)?

ADD REPLY
1
Entering edit mode

Yes. That is how snakemake builds dag. Here is an example snakemake workflow for variant calling using gatk. Look at the final snakemake file: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/workflow/Snakefile and rule for variant calling: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/workflow/rules/calling.smk

ADD REPLY
0
Entering edit mode

Thank you cpad0112, I have made a fork of this repository. I feel very confused about the output-focused approach of Snakemake, but working a bit on the gatk-variant-calling pipeline will be indeed helpful.

ADD REPLY
1
Entering edit mode

Each frame work comes with it's own quirks. Good luck with Snakemake. If you want to try another frame work, I suggest nextflow.

ADD REPLY

Login before adding your answer.

Traffic: 2506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6