I'm pretty new to using STARsolo but I've built a STAR index, and I've just tried to align and count. I'm not quite sure what good results look like from STARsolo but I think the results I've got are probably bad. This is what was in the "Summary.csv" file in the "Solo.out" folder:
Number of Reads,207946411 Reads With Valid Barcodes,0.000762759 Sequencing Saturation,0.195894 Q30 Bases in CB+UMI,0.97659 Q30 Bases in RNA read,0.900359 Reads Mapped to Genome: Unique+Multiple,0.412352 Reads Mapped to Genome: Unique,0.259397 Reads Mapped to Gene: Unique+Multipe Gene,2.23279e-05 Reads Mapped to Gene: Unique Gene,1.68649e-05 Estimated Number of Cells,86 Unique Reads in Cells Mapped to Gene,3507 Fraction of Unique Reads in Cells,1 Mean Reads per Cell,40 Median Reads per Cell,2 UMIs in Cells,2820 Mean UMI per Cell,32 Median UMI per Cell,1 Mean Gene per Cell,22 Median Gene per Cell,1 Total Gene Detected,1335 Summary.csv (END)
As I understand it, 2 is very low for the median reads per cell (among other things). Am I wrong in thinking these are bad results? Are there any resources or references that might help me better set up my STARsolo run script?
For reference, here is my run script:
#!/bin/sh #SBATCH -A p32535 #SBATCH -p normal #SBATCH -N 1 #SBATCH -n 20 #SBATCH -t 06:00:00 #SBATCH --mem=120gb #SBATCH --job-name="star_solo" cd /projects/p32535/scRNA_data module purge all module load STAR/2.7.9a STAR --runThreadN 20 --soloType CB_UMI_Simple --soloCBwhitelist ./9K-LT-march-2021.txt \ --soloCellFilter EmptyDrops_CR --soloFeatures Gene --genomeDir STAR_index/ \ --outFilterType BySJout --alignIntronMax 100000 --quantMode GeneCounts \ --outSAMtype None --soloBarcodeReadLength 0 --readFilesPrefix ./RNAseq/ --readFilesCommand "gzip -cd" \ --readFilesIn Control_GEX_S1_L003_R2_001.fastq.gz,Control_GEX_S1_L004_R2_001.fastq.gz \ Control_GEX_S1_L003_R1_001.fastq.gz,Control_GEX_S1_L004_R1_001.fastq.gz
I was mostly following a prepackaged script, but added
--soloBarcodeReadLength 0 because I was getting this error:
EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26 Read ID=@K00408:227:HL52KBBXY:3:1101:1539:1244 ; Sequence=AGGCAGAGAGTGACCCTCGTGACGATAT SOLUTION: check the formatting of input read files. If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength To avoid checking of barcode read length, specify --soloBarcodeReadLength 0 Dec 24 18:49:42 ...... FATAL ERROR, exiting