Question

Snakemake and STARsolo

0

Entering edit mode

3.0 years ago

wanaga3166 ▴ 10

Hello everyone,

I started a scRNA-seq project recently.

I wrote a snakemake script for the mapping part with STARsolo.

SAMPLES, = glob_wildcards("Data/Raw/{sample}_L001_R1.fastq.gz")

rule StarSolo:
    input:
        R1L1 = "Data/Raw/{sample}_L001_R1.fastq.gz",
        R1L2 = "Data/Raw/{sample}_L002_R1.fastq.gz",
        R2L1 = "Data/Raw/{sample}_L001_R2.fastq.gz",
        R2L2 = "Data/Raw/{sample}_L002_R2.fastq.gz"
    threads:
        8
    shell:
        'STAR --runThreadN {threads} '
        '--genomeDir /Genome/Human/GRCh38/ '
        '--readFilesIn {input.R1L1},{input.R1L2} {input.R2L1},{input.R2L2} '
        '--readFilesCommand gunzip -c '
        '--soloType CB_UMI_Simple '
        '--soloCBwhitelist 3M-february-2018.txt '
        '--soloCBlen 16 '
        '--soloUMIstart 17 '
        '--soloUMIlen 12 '

I obtained this error message:

Building DAG of jobs...
WildcardError in line 11 of /script/Snakefile4:
Wildcards in input files cannot be determined from output files:
'sample'

Where is the problem ? Line 11 is rule StarSolo.

Thank you for your help.

scRNA-seq snakemake Mapping STARsolo • 2.6k views

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 3.0 years ago by wanaga3166 ▴ 10

0

Entering edit mode

How are you running snakemake? Show us that command as well, please. Also, what is the purpose of the , in SAMPLES ,= glob...? And is that line supposed to be outside of a rule?

ADD REPLY • link 3.0 years ago by Ram 43k

0

Entering edit mode

I used this command line.

snakemake --cluster "qsub -N STAR_scRNA -l hostname=Node1 -b y" -s Snakefile4 -j 8 -n -p

Also, what is the purpose of the , in SAMPLES ,= glob...? And is that line supposed to be outside of a rule?

Yes, this line is outside the rule.

ADD REPLY • link 3.0 years ago by wanaga3166 ▴ 10

0

Entering edit mode

Follow Jeremy's advice. Your Snakefile is badly formed and the line outside the rule breaks everything.

ADD REPLY • link 3.0 years ago by Ram 43k

2

Entering edit mode

i think SAMPLES ,= glob_wildcards is valid but i do prefer to know what I am trying to make rather than let a directory run the show https://snakemake.readthedocs.io/en/stable/project_info/faq.html#id13

ADD REPLY • link 3.0 years ago by Jeremy Leipzig 22k

0

Entering edit mode

I stand corrected. I just noticed that OP is trying to run Snakemake without specifying a target.

ADD REPLY • link 3.0 years ago by Ram 43k

0

Entering edit mode

Small comment: for shell:, you just need:

shell:
  """
    STAR \
      --genomeDir {params.genome} \
      --runThreadN {threads} \
      --readFilesIn {params.read1} {params.read2} \
      --readFilesCommand {params.readFilesCommand} \
      ...
  """

ADD REPLY • link 3.0 years ago by Kevin Blighe 87k

score 2 · Accepted Answer · 2021-05-14

2

Entering edit mode

3.0 years ago

Jeremy Leipzig 22k

snakefiles should be composed as such: 1) write implicit rules that connect input and output using wildcards 2) write target rules that list files you want produced as input

ADD COMMENT • link 3.0 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Thank you, that the problems, I don't know the output for Starsolo. I think I will have 6 files:

Log
Feature Statistic Summaries
Alignments
Matrix Gene Counts
Barcodes
Genes

But I don't know the names. So it's difficult to write the output.

ADD REPLY • link 3.0 years ago by wanaga3166 ▴ 10

2

Entering edit mode

one of them is {sample}.Aligned.out.sam. Start by creating a target variable that point to the sam files you want produced

SAMPLES = ['foo','bar'] #or from your glob
SAMS = ["{0}.Aligned.out.sam".format(sample) for sample in SAMPLES]
rule all:
  input: SAMS

ADD REPLY • link 3.0 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Thank Jeremy and Ram for your help. I made some modification in my snakefile.

SAMPLES, = glob_wildcards("Data/Raw/{sample}_L001_R1.fastq.gz")

#SAMPLES = ["Sample_S1_L001_R1_001.fastq.gz", "Sample_S1_L001_R2_001.fastq.gz"]
SAMS = ["{0}.Aligned.out.sam".format(sample) for sample in SAMPLES]

rule all:
    input:
        SAMS

rule StarSolo:
    input:
        R1L1 = "Data/Raw/{sample}_L001_R1.fastq.gz",
        R1L2 = "Data/Raw/{sample}_L002_R1.fastq.gz",
        R2L1 = "Data/Raw/{sample}_L001_R2.fastq.gz",
        R2L2 = "Data/Raw/{sample}_L002_R2.fastq.gz"
    output:
        "{0}.Aligned.out.sam"
    threads:
        8
    shell:
        'STAR --runThreadN {threads} '
        '--genomeDir /Genome/Human/GRCh38/ '
        '--readFilesIn {input.R2L1},{input.R2L2} {input.R1L1},{input.R1L2} '
        '--readFilesCommand gunzip -c '
        '--soloType CB_UMI_Simple '
        '--soloCBwhitelist 3M-february-2018.txt '
        '--soloCBlen 16 '
        '--soloUMIstart 17 '
        '--soloUMIlen 12 '

It's work better, but STARsolo didn't start its analyze.

Below the snakemake return:

(snakemake) -bash-4.2$ snakemake --cluster "qsub -N STAR_scRNA -l hostname=Node1 -b y" -s Snakefile4 -j 8 -n
Building DAG of jobs...
Job counts:
    count   jobs
    1   all
    1

[Sat May 15 22:52:56 2021]
localrule all:
    jobid: 0

Job counts:
    count   jobs
    1   all
    1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Where I did a mistake ?

ADD REPLY • link 3.0 years ago by wanaga3166 ▴ 10

1

Entering edit mode

{0} is not a Snakemake wildcard, it's only for string placeholders in Python, so use output: "{sample}.Aligned.out.sam"

ADD REPLY • link 3.0 years ago by Jeremy Leipzig 22k

1

Entering edit mode

Thank you for your help. I solved the problem :-)

ADD REPLY • link 3.0 years ago by wanaga3166 ▴ 10

0

Entering edit mode

I've moved Jeremy's comment to an answer. Please accept it to mark your post as resolved.

ADD REPLY • link 3.0 years ago by Ram 43k