Snakemake and STARsolo
1
0
Entering edit mode
2.9 years ago
wanaga3166 ▴ 10

Hello everyone,

I started a scRNA-seq project recently.

I wrote a snakemake script for the mapping part with STARsolo.

SAMPLES, = glob_wildcards("Data/Raw/{sample}_L001_R1.fastq.gz")

rule StarSolo:
    input:
        R1L1 = "Data/Raw/{sample}_L001_R1.fastq.gz",
        R1L2 = "Data/Raw/{sample}_L002_R1.fastq.gz",
        R2L1 = "Data/Raw/{sample}_L001_R2.fastq.gz",
        R2L2 = "Data/Raw/{sample}_L002_R2.fastq.gz"
    threads:
        8
    shell:
        'STAR --runThreadN {threads} '
        '--genomeDir /Genome/Human/GRCh38/ '
        '--readFilesIn {input.R1L1},{input.R1L2} {input.R2L1},{input.R2L2} '
        '--readFilesCommand gunzip -c '
        '--soloType CB_UMI_Simple '
        '--soloCBwhitelist 3M-february-2018.txt '
        '--soloCBlen 16 '
        '--soloUMIstart 17 '
        '--soloUMIlen 12 '

I obtained this error message:

Building DAG of jobs...
WildcardError in line 11 of /script/Snakefile4:
Wildcards in input files cannot be determined from output files:
'sample'

Where is the problem ? Line 11 is rule StarSolo.

Thank you for your help.

scRNA-seq snakemake Mapping STARsolo • 2.5k views
ADD COMMENT
0
Entering edit mode

How are you running snakemake? Show us that command as well, please. Also, what is the purpose of the , in SAMPLES ,= glob...? And is that line supposed to be outside of a rule?

ADD REPLY
0
Entering edit mode

I used this command line.

snakemake --cluster "qsub -N STAR_scRNA -l hostname=Node1 -b y" -s Snakefile4 -j 8 -n -p

Also, what is the purpose of the , in SAMPLES ,= glob...? And is that line supposed to be outside of a rule?

Yes, this line is outside the rule.

ADD REPLY
0
Entering edit mode

Follow Jeremy's advice. Your Snakefile is badly formed and the line outside the rule breaks everything.

ADD REPLY
2
Entering edit mode

i think SAMPLES ,= glob_wildcards is valid but i do prefer to know what I am trying to make rather than let a directory run the show https://snakemake.readthedocs.io/en/stable/project_info/faq.html#id13

ADD REPLY
0
Entering edit mode

I stand corrected. I just noticed that OP is trying to run Snakemake without specifying a target.

ADD REPLY
0
Entering edit mode

Small comment: for shell:, you just need:

shell:
  """
    STAR \
      --genomeDir {params.genome} \
      --runThreadN {threads} \
      --readFilesIn {params.read1} {params.read2} \
      --readFilesCommand {params.readFilesCommand} \
      ...
  """
ADD REPLY
2
Entering edit mode
2.9 years ago

snakefiles should be composed as such: 1) write implicit rules that connect input and output using wildcards 2) write target rules that list files you want produced as input

ADD COMMENT
0
Entering edit mode

Thank you, that the problems, I don't know the output for Starsolo. I think I will have 6 files:

  • Log
  • Feature Statistic Summaries
  • Alignments
  • Matrix Gene Counts
  • Barcodes
  • Genes

But I don't know the names. So it's difficult to write the output.

ADD REPLY
2
Entering edit mode

one of them is {sample}.Aligned.out.sam. Start by creating a target variable that point to the sam files you want produced

SAMPLES = ['foo','bar'] #or from your glob
SAMS = ["{0}.Aligned.out.sam".format(sample) for sample in SAMPLES]
rule all:
  input: SAMS
ADD REPLY
0
Entering edit mode

Thank Jeremy and Ram for your help. I made some modification in my snakefile.

SAMPLES, = glob_wildcards("Data/Raw/{sample}_L001_R1.fastq.gz")

#SAMPLES = ["Sample_S1_L001_R1_001.fastq.gz", "Sample_S1_L001_R2_001.fastq.gz"]
SAMS = ["{0}.Aligned.out.sam".format(sample) for sample in SAMPLES]

rule all:
    input:
        SAMS

rule StarSolo:
    input:
        R1L1 = "Data/Raw/{sample}_L001_R1.fastq.gz",
        R1L2 = "Data/Raw/{sample}_L002_R1.fastq.gz",
        R2L1 = "Data/Raw/{sample}_L001_R2.fastq.gz",
        R2L2 = "Data/Raw/{sample}_L002_R2.fastq.gz"
    output:
        "{0}.Aligned.out.sam"
    threads:
        8
    shell:
        'STAR --runThreadN {threads} '
        '--genomeDir /Genome/Human/GRCh38/ '
        '--readFilesIn {input.R2L1},{input.R2L2} {input.R1L1},{input.R1L2} '
        '--readFilesCommand gunzip -c '
        '--soloType CB_UMI_Simple '
        '--soloCBwhitelist 3M-february-2018.txt '
        '--soloCBlen 16 '
        '--soloUMIstart 17 '
        '--soloUMIlen 12 ' 

It's work better, but STARsolo didn't start its analyze.

Below the snakemake return:

(snakemake) -bash-4.2$ snakemake --cluster "qsub -N STAR_scRNA -l hostname=Node1 -b y" -s Snakefile4 -j 8 -n
Building DAG of jobs...
Job counts:
    count   jobs
    1   all
    1

[Sat May 15 22:52:56 2021]
localrule all:
    jobid: 0

Job counts:
    count   jobs
    1   all
    1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Where I did a mistake ?

ADD REPLY
1
Entering edit mode

{0} is not a Snakemake wildcard, it's only for string placeholders in Python, so use output: "{sample}.Aligned.out.sam"

ADD REPLY
1
Entering edit mode

Thank you for your help. I solved the problem :-)

ADD REPLY
0
Entering edit mode

I've moved Jeremy's comment to an answer. Please accept it to mark your post as resolved.

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6