Question: Error: Snakemake skiped a rule
0
gravatar for c.clarido
3 months ago by
c.clarido40
Netherlands/Rotterdam/Leiden University (Applied Science)
c.clarido40 wrote:

Hello community,

I made a pipeline using snakemake to count SNPs and INDELs. There were no problems when I run the pipeline with smaller data, which is the fragment of the original data. However, when I start running the pipeline using the original data, for some reason a rule has been skipped which caused the next rule not to execute properly. The error is showed as follow:

[Thu Oct 18 22:57:16 2018]
Finished job 3.
2 of 16 steps (12%) done

[Thu Oct 18 22:57:17 2018]
rule invenT:
    input: /home/s1104230/output/tr1.fastq, /home/s1104230/output/tr2.fastq
    output: /home/s1104230/output/itr1.fastq, /home/s1104230/output/itr2k.fastq
    jobid: 8

[Thu Oct 18 23:05:56 2018]
Finished job 8.
3 of 16 steps (19%) done

[Thu Oct 18 23:05:56 2018]
rule bowtie2Aln:
    input: /home/s1104230/output/tr1.fastq, /home/s1104230/output/tr2.fastq
    output: /home/s1104230/mapping/aln.sam
    jobid: 2

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_CTYPE = "UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Could not locate a Bowtie index corresponding to basename "/home/s1104230/mapping/reference"
Error: Encountered internal Bowtie 2 exception (#1)
Command: /usr/bin/bowtie2-align-s --wrapper basic-0 -x /home/s1104230/mapping/reference -1 /home/s1104230/output/tr1.fastq -2 /home/s1104230/output/tr2.fastq
(ERR): bowtie2-align exited with value 1
[Thu Oct 18 23:05:57 2018]
Error in rule bowtie2Aln:
    jobid: 2
    output: /home/s1104230/mapping/aln.sam

RuleException:
CalledProcessError in line 36 of /home/s1104230/scripts/Snakefile:
Command ' set -euo pipefail;  bowtie2 -x /home/s1104230/mapping/reference -1 /home/s1104230/output/tr1.fastq -2 /home/s1104230/output/tr2.fastq > /home/s1104230/mapping/aln.sam ' returned non-zero exit status 1
  File "/home/s1104230/scripts/Snakefile", line 36, in __rule_bowtie2Aln
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job bowtie2Aln since they might be corrupted:
/home/s1104230/mapping/aln.sam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/s1104230/scripts/.snakemake/log/2018-10-18T212758.414196.snakemake.log

real    97m59.058s
user    97m35.924s
sys     0m16.792s

As you can see, the rule bowtie2Build has been skipped. This is the rule to build an index from the reference fasta. The output of this is needed for the next rule "bowtie2Align" to execute. The snakefile is as follow:

rule invenU:
    # input: ["/home/bnextgen/reads/bngsa_nietinfected_1.fastq","/home/bnextgen/reads/bngsa_nietinfected_2.fastq"]  
    # output: ["/home/s1104230/output/iutr1.fastq", "/home/s1104230/output/iutr2.fastq"]
    # script: "/home/s1104230/scripts/inven.py"
    input: ["/home/s1104230/data/bngsa1_24M.txt","/home/s1104230/data/bngsa2_24M.txt"]  
    output: ["/home/s1104230/output/iutr1.fastq", "/home/s1104230/output/iutr2.fastq"]
    script: "/home/s1104230/scripts/inven.py"

rule trimmen:
    input: ["/home/s1104230/data/bngsa1_24M.txt","/home/s1104230/data/bngsa2_24M.txt"]   
    output: ["/home/s1104230/output/tr1.fastq", "/home/s1104230/output/tr2.fastq"]
    script: "/home/s1104230/scripts/trimming.py"

    # input: ["/home/bnextgen/reads/bngsa_nietinfected_1.fastq","/home/bnextgen/reads/bngsa_nietinfected_2.fastq"]   
    # output: ["/home/s1104230/output/tr1.fastq", "/home/s1104230/output/tr2.fastq"]
    # script: "/home/s1104230/scripts/trimming.py"

rule invenT:
    input: rules.trimmen.output
    output: ["/home/s1104230/output/itr1.fastq", "/home/s1104230/output/itr2.fastq"]
    script: "/home/s1104230/scripts/inven.py"

rule bowtie2Build: 
    input: 
        "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename="/home/s1104230/mapping/reference"
    output:
        output1="/home/s1104230/mapping/reference.1.bt2",
        output2="/home/s1104230/mapping/reference.2.bt2",
        output3="/home/s1104230/mapping/reference.3.bt2",
        output4="/home/s1104230/mapping/reference.4.bt2",
        outputrev1="/home/s1104230/mapping/reference.rev.1.bt2",
        outputrev2="/home/s1104230/mapping/reference.rev.2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

rule bowtie2Aln:
    input: rules.trimmen.output
    params:
        basename="/home/s1104230/mapping/reference"
    output: "/home/s1104230/mapping/aln.sam"
    shell:
        "bowtie2 -x {params.basename} -1 {input[0]} -2 {input[1]} > {output}"

rule sam2bam:
    input: rules.bowtie2Aln.output
    output: "/home/s1104230/mapping/aln.bam"
    shell: "samtools view -Sb {input} > {output}"

rule sortbam:
    input: rules.sam2bam.output
    params:
        basename="/home/s1104230/mapping/sorted"
    output: 
        output1="/home/s1104230/mapping/sorted.bam"
    shell: "samtools sort {input} {params.basename}"

rule samIndex:
    input: rules.sortbam.output
    output: "/home/s1104230/mapping/sorted.bam.bai"
    shell: "samtools index {input} {output}"

rule copyRef:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    output: "/home/s1104230/mapping/infected_consensus.fasta"
    shell: "cp {input} {output}"

rule samIndex2:
    input: rules.copyRef.output
    output: "/home/s1104230/mapping/infected_consensus.fasta.fai"
    shell: "samtools faidx {input}"

rule bam2Pileup:
    input: 
        rules.copyRef.output, 
        rules.sortbam.output
    output: "/home/s1104230/mapping/aln.mpileup"
    shell: "samtools mpileup -f {input[0]} {input[1]} > {output}"

rule pileup2Bcf:
    input: 
        rules.copyRef.output, 
        rules.sortbam.output
    output: "/home/s1104230/mapping/varcalls.bcf"
    shell: "samtools mpileup -uf {input[0]} {input[1]} > {output}"

rule bcf2Vcf:
    input: rules.pileup2Bcf.output
    output: "/home/s1104230/mapping/varcalls.vcf"
    shell: "bcftools view -cg {input} > {output}"

rule Vcf2fq:
    input: rules.bcf2Vcf.output
    output: "/home/s1104230/mapping/consensus.fq"
    shell: "/usr/share/samtools/vcfutils.pl vcf2fq {input} > {output}"

rule Vcf2txt:
    input: rules.bcf2Vcf.output
    output: "/home/s1104230/mapping/varcalls.txt"
    shell: "cat {input} > {output}"

rule countvar:
    input: rules.Vcf2txt.output
    output: 
        "/home/s1104230/mapping/INDELs.txt",
        "/home/s1104230/mapping/SNPs.txt"
    script: "/home/s1104230/scripts/countvar.py"

rule all: 
    input:
        rules.invenU.output,
        rules.trimmen.output,
        rules.invenT.output,
        rules.bowtie2Build.output,#6mins
        rules.bowtie2Aln.output,
        rules.sam2bam.output,
        rules.sortbam.output,
        rules.samIndex.output,
        rules.copyRef.output,
        rules.samIndex2.output,
        rules.bam2Pileup.output,
        rules.pileup2Bcf.output,
        rules.bcf2Vcf.output,
        rules.Vcf2fq.output,
        rules.Vcf2txt.output,
        rules.countvar.output

I run the workflow with "time snakemake all" Before doing so, I made sure to rm all outputs from the previous test.

Does anyone know where the problem lies?

Thank you in advance.

ADD COMMENTlink modified 3 months ago by rizoic190 • written 3 months ago by c.clarido40
0
gravatar for rizoic
3 months ago by
rizoic190
rizoic190 wrote:

One of the problems I can see is that you have to add the output of rule bowtie2Build as input for rule bowtie2Aln. This way snakemake understands that it should perform the alignment only after the generation of the bowtie index.

What it might do right now is execute the alignment step even before the build step as it is unaware of this dependency and thinks these are parallel tasks.

ADD COMMENTlink modified 3 months ago • written 3 months ago by rizoic190

Thank you for your reply. That may indeed be the case, how can I prevent this from happening?

ADD REPLYlink written 3 months ago by c.clarido40

In addition I found the following the information from snakemake documentation:

Snakemake allows rules to specify numeric priorities:

rule:
  input: ...
  output: ...
  priority: 50
  shell: ...
Per default, each rule has a priority of 0. Any rule that specifies a higher priority, will be preferred by the scheduler over all rules that are ready to execute at the same time without having at least the same priority.

Furthermore, the --prioritize or -P command line flag allows to specify files (or rules) that shall be created with highest priority during the workflow execution. This means that the scheduler will assign the specified target and all its dependencies highest priority, such that the target is finished as soon as possible. The --dryrun or -n option allows you to see the scheduling plan including the assigned priorities.

Maybe I should try this?

ADD REPLYlink written 3 months ago by c.clarido40

Changing

rule bowtie2Aln:
input: rules.trimmen.output

To

rule bowtie2Aln:
input: rules.trimmen.output, rules.bowtie2Build.output

Should solve this error for you. It would be better to state an explicit dependency than specifying a priority in this case.

You can use the --dryrun/-n option of snakemake to test out configurations and only launch when it prints the expected commands.

ADD REPLYlink modified 3 months ago • written 3 months ago by rizoic190

Thank you rizoic, I will definitely try this, In the meantime I used the priority method and it seems to be working ( still processing ).

ADD REPLYlink written 3 months ago by c.clarido40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1604 users visited in the last hour