Question

Improving my STARsolo results

0

Entering edit mode

2.3 years ago

Aaron ▴ 30

I'm pretty new to using STARsolo but I've built a STAR index, and I've just tried to align and count. I'm not quite sure what good results look like from STARsolo but I think the results I've got are probably bad. This is what was in the "Summary.csv" file in the "Solo.out" folder:

Number of Reads,207946411
Reads With Valid Barcodes,0.000762759
Sequencing Saturation,0.195894
Q30 Bases in CB+UMI,0.97659
Q30 Bases in RNA read,0.900359
Reads Mapped to Genome: Unique+Multiple,0.412352
Reads Mapped to Genome: Unique,0.259397
Reads Mapped to Gene: Unique+Multipe Gene,2.23279e-05
Reads Mapped to Gene: Unique Gene,1.68649e-05
Estimated Number of Cells,86
Unique Reads in Cells Mapped to Gene,3507
Fraction of Unique Reads in Cells,1
Mean Reads per Cell,40
Median Reads per Cell,2
UMIs in Cells,2820
Mean UMI per Cell,32
Median UMI per Cell,1
Mean Gene per Cell,22
Median Gene per Cell,1
Total Gene Detected,1335
Summary.csv (END)

As I understand it, 2 is very low for the median reads per cell (among other things). Am I wrong in thinking these are bad results? Are there any resources or references that might help me better set up my STARsolo run script?

EDIT:

For reference, here is my run script:

#!/bin/sh
#SBATCH -A p32535
#SBATCH -p normal
#SBATCH -N 1
#SBATCH -n 20
#SBATCH -t 06:00:00
#SBATCH --mem=120gb
#SBATCH --job-name="star_solo"

cd /projects/p32535/scRNA_data

module purge all
module load STAR/2.7.9a

STAR --runThreadN 20 --soloType CB_UMI_Simple --soloCBwhitelist ./9K-LT-march-2021.txt \
--soloCellFilter EmptyDrops_CR --soloFeatures Gene --genomeDir STAR_index/ \
--outFilterType BySJout --alignIntronMax 100000 --quantMode GeneCounts \
--outSAMtype None --soloBarcodeReadLength 0 --readFilesPrefix ./RNAseq/ --readFilesCommand "gzip -cd" \
--readFilesIn Control_GEX_S1_L003_R2_001.fastq.gz,Control_GEX_S1_L004_R2_001.fastq.gz \
Control_GEX_S1_L003_R1_001.fastq.gz,Control_GEX_S1_L004_R1_001.fastq.gz

I was mostly following a prepackaged script, but added --soloBarcodeReadLength 0 because I was getting this error:

EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26
Read ID=@K00408:227:HL52KBBXY:3:1101:1539:1244 ;  Sequence=AGGCAGAGAGTGACCCTCGTGACGATAT
SOLUTION: check the formatting of input read files.
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
To avoid checking of barcode read length, specify --soloBarcodeReadLength 0
Dec 24 18:49:42 ...... FATAL ERROR, exiting

STARsolo • 2.0k views

ADD COMMENT • link updated 2.3 years ago by jarninggau ▴ 40 • written 2.3 years ago by Aaron ▴ 30

1

Entering edit mode

What type of library is that? 10X?

ADD REPLY • link 2.3 years ago by ATpoint 81k

score 4 · Answer 1 · 2021-12-25

What's the library? 10x v2, v3, or drop-seq? As you mentioned the barcode sequence is 28bp, I guess it is 10x v3. If it is, I suggest you delete --soloBarcodeReadLength 0. Because the default barcode lengths of STARsolo (CB=16bp, UMI=10bp) work for 10X Chromium V2. For V3, specify: --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 (CB=16bp, UMI=12bp)