Question: How to write the output in snakefile (snakemake) for bowtie2-build
2
gravatar for c.clarido
9 weeks ago by
c.clarido40
Netherlands/Rotterdam/Leiden University (Applied Science)
c.clarido40 wrote:

Hello community,

I am using snakemake to make a pipeline. I want to add the bowtie2-build from bowtie2 to my current snakefile as follow:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

So I should be expecting the following files: reference.1.bt2 reference.2.bt2 reference.3.bt2 reference.4.bt2 reference.rev.1.bt2 reference.rev.2.bt2

But it seems the problem lies in the output. How can I write the output?

ADD COMMENTlink modified 2 days ago by zorrilla0 • written 9 weeks ago by c.clarido40
5
gravatar for IP
9 weeks ago by
IP500
Denmark/University of Copenagen
IP500 wrote:

You wrote this:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

What snakemake is going to do is to check if the file output ( in this case "output/reference" ) exist after executing the rule. Which doesn't since it is the basename for bowtie

What you can do is to pass the basename for the index as a parameter to snakemake function. Something like the following:

 rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    params:
        basename="output/reference"
   output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"
ADD COMMENTlink modified 9 weeks ago by Devon Ryan86k • written 9 weeks ago by IP500

Hello,

Using the following code:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2"
        output2="output/reference.2.bt2"
        output3="output/reference.3.bt2"
        output4="output/reference.4.bt2"
        outputrev1="output/reference.rev1.bt2"
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

I get :

SyntaxError in line 31 of /home/s1104230/scripts/Snakefile:
Unexpected keyword output1 in rule definition (Snakefile, line 31)
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by c.clarido40
1

Hello,

after each defined output there must be a ,.

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

fin swimmer

ADD REPLYlink written 9 weeks ago by finswimmer8.0k

I've updated this in the original (fixed the colon after params too). Good catch.

ADD REPLYlink written 9 weeks ago by Devon Ryan86k

Hello So I tried again the suggested code but it seems that the error occurs at the rev. files:

MissingOutputException in line 26 of /home/s1104230/scripts/Snakefile:
Missing files after 5 seconds:
/home/s1104230/mapping2/reference.rev2.bt2
/home/s1104230/mapping2/reference.rev1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
ADD REPLYlink written 8 weeks ago by c.clarido40

See if those files show up eventually. We usually use --latency-wait 300 do get around this.

ADD REPLYlink written 8 weeks ago by Devon Ryan86k

thanks for the corrections :)

ADD REPLYlink written 8 weeks ago by IP500
1
gravatar for Eric Lim
8 weeks ago by
Eric Lim1.2k
Boston
Eric Lim1.2k wrote:

IP's solution is best as expected outputs are listed explicitly which would take advantage of timestamp.

Since v5.2.0, you can also accomplish this by using directory, which shorten the code by quite a bit while maintaining timestamp, but snakemake won't check if all expected outputs exist.

rule bowtie_build:
  input: 'chrM.fa'
  output: directory('reference/bowtie/')
  shell: 'bowtie2-build {input} {output}'

In this particular example, you'd need to ls -a to see the outputs.

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#directories-as-outputs

directory has been super helpful in eliminating a bunch of if-else on rules when we don't know what outputs to expect at code time.

ADD COMMENTlink written 8 weeks ago by Eric Lim1.2k
0
gravatar for dariober
9 weeks ago by
dariober9.7k
Glasgow - UK
dariober9.7k wrote:

IP's answer is what I would do. An alternative is to use the touch function to create a marker file that signals that a task has successfully completed:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        touch('index.done'),
    shell: "bowtie2-build {input} {params.basename}"

Then, rules that depend on the bowtie index will take in input the file index.done.

ADD COMMENTlink written 9 weeks ago by dariober9.7k

Hello, I tried your suggestion But I get the following error:

Error in rule bowtie2Build: jobid: 0 output: index.done

RuleException:
CalledProcessError in line 49 of /home/s1104230/scripts/Snakefile:
Command ' set -euo pipefail;  bowtie2-build  /home/s1104230/mapping2/reference ' returned non-zero exit status 1
  File "/home/s1104230/scripts/Snakefile", line 49, in __rule_bowtie2Build
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
ADD REPLYlink written 8 weeks ago by c.clarido40

The shell command is executed but it is throwing an error. In fact, bowtie2-build should have two positional arguments but there's only one:

bowtie2-build  /home/s1104230/mapping2/reference

It should be something like:

bowtie2-build /home/bnextgen/refgenome/infected_consensus.fasta /home/s1104230/mapping2/reference
ADD REPLYlink written 8 weeks ago by dariober9.7k
0
gravatar for zorrilla
2 days ago by
zorrilla0
zorrilla0 wrote:

sort of off topic, but shouldt you also expect to get a .fai file as the output of the bowtie2 build? Can someone clarify this for me?

ADD COMMENTlink written 2 days ago by zorrilla0

-A fai file is a index on the fasta file with the format specified here -Bowtie2 output is the Burrows-Wheeler transformation of the reference genome. You can read about the BW transform here.

They are two different indexing techniques, so bowtie2-build does create a fai file because it does not need it for the alignment. It is just searching the BWT

ADD REPLYlink written 1 day ago by IP500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1627 users visited in the last hour