Remove duplicates

Question

Seeking help with snakemake

3

Entering edit mode

4.5 years ago

Ming ▴ 110

Dear All,

I am trying to run BBMap with snakemake, and I am pretty new to this.

# rule all: Specifies the files that you would like to create during your snakemake workflow
import os
import snakemake.io
import glob

(SAMPLES,READS,) = glob_wildcards("/home/tanshiming/Downloads/{sample}_{read}_001.fastq.gz")
READS=["R1","R2"]

rule all:
  input: expand("/home/tanshiming/Downloads/{sample}_{read}_001.fastq.gz",sample=SAMPLES, read=READS)

rule clumpify:
  input:
    r1="/home/tanshiming/Downloads/{sample}_R1_001.fastq.gz",
    r2="/home/tanshiming/Downloads/{sample}_R2_001.fastq.gz"

  output:
      o1="/home/tanshiming/Downloads/Clumpify/{sample}_R1.fastq.gz",
      o2="/home/tanshiming/Downloads/Clumpify/{sample}_R2.fastq.gz"

  shell:
    "clumpify.sh -Xmx50g in1={input.r1} in2=${input.r2} out1=Clumpify/{output.o1} out2=Clumpify/${output.o2} reorder ziplevel=9 dedupe=t optical=t"

When I try to run snakemake, I got the following error:

(bbmap) tanshiming@S620100019205:~/Scripts$ snakemake -n
SyntaxError in line 15 of /home/tanshiming/Scripts/Snakefile:
invalid syntax

This is the code that I will like to run:

Remove duplicates

for x in *_R1_001.fastq.gz
    do clumpify.sh -Xmx250g in1=$x in2=${x%_R1_001*}_R2_001.fastq.gz out1=Clumpify/$x out2=Clumpify/${x%_R1_001*}_R2_001.fastq.gz reorder ziplevel=9 dedupe=t optical=t
done

Appreciate any advice that I can get!

Thank you.

snakemake • 2.3k views

ADD COMMENT • link 4.5 years ago by Ming ▴ 110

score 4 · Answer 1 · 2019-11-08

4

Entering edit mode

4.5 years ago

gb ★ 2.2k

You are missing comma's after the input and output files.

ADD COMMENT • link 4.5 years ago by gb ★ 2.2k

0

Entering edit mode

Thanks @gb, but I am getting the following error now:

snakemake -n
SyntaxError in line 22 of /home/tanshiming/Scripts/Snakefile:
invalid syntax

This is at the clumpify.sh line.

ADD REPLY • link 4.5 years ago by Ming ▴ 110

2

Entering edit mode

You are missing "" around the command....

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html

ADD REPLY • link 4.5 years ago by gb ★ 2.2k

0

Entering edit mode

Dear @gb,

When I try to run snakemake, this script does not seem to run:

(bbmap) tanshiming@S620100019205:~/Scripts$ snakemake
Building DAG of jobs...
Nothing to be done.
Complete log: /home/tanshiming/Scripts/.snakemake/log/2019-11-08T154720.993886.snakemake.log

But I do see that the job has not run.......

ADD REPLY • link 4.5 years ago by Ming ▴ 110

2

Entering edit mode

The input specification of rule all is exactly the files that you already have at the beginning. Therefore snakemake doesn't do anything: you already have what you need.

ADD REPLY • link 4.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Dear WouterDeCoster,

Does that mean I have to delete the rule all to run the script?

ADD REPLY • link 4.5 years ago by Ming ▴ 110

2

Entering edit mode

I believe the rule all need to be the output of rule clumpify. Snakemake checks what files it needs to output (rule all). Next, it checks how it can get those files. So if you put the output files from rule clumpify in rule all there is a "connection".

Snakemake checks the output files in rule all, they are not there yet. It check how it can get them, it sees that if he execute rule clumpify he gets the output he needs. So he will first execute that rule before he can finish.

ADD REPLY • link 4.5 years ago by gb ★ 2.2k

0

Entering edit mode

Have you tried following the tutorial?

The input of rule all should be the file you aim to obtain out of this workflow. It should be the final output file.

ADD REPLY • link 4.5 years ago by WouterDeCoster 47k

2

Entering edit mode

This worked for me!

# rule all: Specifies the files that you would like to create during your snakemake workflow

    import os
    import snakemake.io
    import glob

    (SAMPLES,READS,) = glob_wildcards("/home/tanshiming/Downloads/{sample}_{read}_001.fastq.gz")
    READS=["R1","R2"]

    rule all:
      input:
        expand("/home/tanshiming/Downloads/Clumpify/{sample}_{read}.fastq.gz",sample=SAMPLES, read=READS)

    rule clumpify:
      input:
        r1="/home/tanshiming/Downloads/{sample}_R1_001.fastq.gz",
        r2="/home/tanshiming/Downloads/{sample}_R2_001.fastq.gz"

      output:
          o1="/home/tanshiming/Downloads/Clumpify/{sample}_R1.fastq.gz",
          o2="/home/tanshiming/Downloads/Clumpify/{sample}_R2.fastq.gz"

      shell:
        "clumpify.sh -Xmx50g in1={input.r1} in2={input.r2} out1={output.o1} out2={output.o2} reorder ziplevel=9 dedupe=t optical=t"

Thank you very much for your help!

ADD REPLY • link 4.5 years ago by Ming ▴ 110

2

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY • link 4.5 years ago by WouterDeCoster 47k