Question: How to write the output in snakefile (snakemake) for bowtie2-build
2
gravatar for c.clarido
8 months ago by
c.clarido50
Netherlands/Rotterdam/Leiden University (Applied Science)
c.clarido50 wrote:

Hello community,

I am using snakemake to make a pipeline. I want to add the bowtie2-build from bowtie2 to my current snakefile as follow:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

So I should be expecting the following files: reference.1.bt2 reference.2.bt2 reference.3.bt2 reference.4.bt2 reference.rev.1.bt2 reference.rev.2.bt2

But it seems the problem lies in the output. How can I write the output?

ADD COMMENTlink modified 6 months ago by zorrilla0 • written 8 months ago by c.clarido50
5
gravatar for IP
8 months ago by
IP550
Denmark/University of Copenagen
IP550 wrote:

You wrote this:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

What snakemake is going to do is to check if the file output ( in this case "output/reference" ) exist after executing the rule. Which doesn't since it is the basename for bowtie

What you can do is to pass the basename for the index as a parameter to snakemake function. Something like the following:

 rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    params:
        basename="output/reference"
   output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"
ADD COMMENTlink modified 8 months ago by Devon Ryan90k • written 8 months ago by IP550

Hello,

Using the following code:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2"
        output2="output/reference.2.bt2"
        output3="output/reference.3.bt2"
        output4="output/reference.4.bt2"
        outputrev1="output/reference.rev1.bt2"
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

I get :

SyntaxError in line 31 of /home/s1104230/scripts/Snakefile:
Unexpected keyword output1 in rule definition (Snakefile, line 31)
ADD REPLYlink modified 8 months ago • written 8 months ago by c.clarido50
1

Hello,

after each defined output there must be a ,.

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

fin swimmer

ADD REPLYlink written 8 months ago by finswimmer11k

I've updated this in the original (fixed the colon after params too). Good catch.

ADD REPLYlink written 8 months ago by Devon Ryan90k

Hello So I tried again the suggested code but it seems that the error occurs at the rev. files:

MissingOutputException in line 26 of /home/s1104230/scripts/Snakefile:
Missing files after 5 seconds:
/home/s1104230/mapping2/reference.rev2.bt2
/home/s1104230/mapping2/reference.rev1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
ADD REPLYlink written 8 months ago by c.clarido50

See if those files show up eventually. We usually use --latency-wait 300 do get around this.

ADD REPLYlink written 8 months ago by Devon Ryan90k

thanks for the corrections :)

ADD REPLYlink written 8 months ago by IP550
1
gravatar for Eric Lim
8 months ago by
Eric Lim1.4k
Boston
Eric Lim1.4k wrote:

IP's solution is best as expected outputs are listed explicitly which would take advantage of timestamp.

Since v5.2.0, you can also accomplish this by using directory, which shorten the code by quite a bit while maintaining timestamp, but snakemake won't check if all expected outputs exist.

rule bowtie_build:
  input: 'chrM.fa'
  output: directory('reference/bowtie/')
  shell: 'bowtie2-build {input} {output}'

In this particular example, you'd need to ls -a to see the outputs.

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#directories-as-outputs

directory has been super helpful in eliminating a bunch of if-else on rules when we don't know what outputs to expect at code time.

ADD COMMENTlink written 8 months ago by Eric Lim1.4k
0
gravatar for dariober
8 months ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

IP's answer is what I would do. An alternative is to use the touch function to create a marker file that signals that a task has successfully completed:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        touch('index.done'),
    shell: "bowtie2-build {input} {params.basename}"

Then, rules that depend on the bowtie index will take in input the file index.done.

ADD COMMENTlink written 8 months ago by dariober10k

Hello, I tried your suggestion But I get the following error:

Error in rule bowtie2Build: jobid: 0 output: index.done

RuleException:
CalledProcessError in line 49 of /home/s1104230/scripts/Snakefile:
Command ' set -euo pipefail;  bowtie2-build  /home/s1104230/mapping2/reference ' returned non-zero exit status 1
  File "/home/s1104230/scripts/Snakefile", line 49, in __rule_bowtie2Build
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
ADD REPLYlink written 8 months ago by c.clarido50

The shell command is executed but it is throwing an error. In fact, bowtie2-build should have two positional arguments but there's only one:

bowtie2-build  /home/s1104230/mapping2/reference

It should be something like:

bowtie2-build /home/bnextgen/refgenome/infected_consensus.fasta /home/s1104230/mapping2/reference
ADD REPLYlink written 8 months ago by dariober10k
0
gravatar for zorrilla
6 months ago by
zorrilla0
zorrilla0 wrote:

sort of off topic, but shouldt you also expect to get a .fai file as the output of the bowtie2 build? Can someone clarify this for me?

ADD COMMENTlink written 6 months ago by zorrilla0

-A fai file is a index on the fasta file with the format specified here -Bowtie2 output is the Burrows-Wheeler transformation of the reference genome. You can read about the BW transform here.

They are two different indexing techniques, so bowtie2-build does create a fai file because it does not need it for the alignment. It is just searching the BWT

ADD REPLYlink written 6 months ago by IP550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1510 users visited in the last hour