Issues with Quantification with STARsolo
1
0
Entering edit mode
2.2 years ago
sroth • 0

I am using STARsolo on publicly available 10x V3 data (example accession here: https://www.ncbi.nlm.nih.gov/sra?term=SRX9708219 ). The reads are paired-end at 150 bp each The output is highly irregular and does not match CellRanger 6.1.2 on the same data. Of note, STARsolo is not detecting any cells.

Here is how I generated the genome index (where genome.fa and genes.gtf are from CellRanger's reference):

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir genome_idx --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --genomeSAsparseD 3

Here is the alignment/quantification command:

STAR --genomeDir genome_idx/ --readFilesIn SRR13278442_2.fastq SRR13278442_1.fastq --soloType CB_UMI_Simple --soloCBwhitelist 10x-v3-barcodes.txt --soloUMIlen 12 --soloCBlen 16 --soloUMIstart 17 --soloCBstart 1 --soloBarcodeReadLength 0 --clipAdapterType CellRanger4 --outFilterScoreMin 30 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --outSAMtype BAM Unsorted --runThreadN 16 --outFileNamePrefix test --soloCellFilter EmptyDrops_CR --soloFeatures GeneFull

This is the summary file output from STARsolo:

Number of Reads,131343810
Reads With Valid Barcodes,0.981454
Sequencing Saturation,-nan
Q30 Bases in CB+UMI,0.951324
Q30 Bases in RNA read,0.901431
Reads Mapped to Genome: Unique+Multiple,0.925342
Reads Mapped to Genome: Unique,0.850182
Reads Mapped to GeneFull: Unique+Multiple GeneFull,0
Reads Mapped to GeneFull: Unique GeneFull,0
Estimated Number of Cells,0
Unique Reads in Cells Mapped to GeneFull,0
Fraction of Unique Reads in Cells,-nan
Mean Reads per Cell,0
Median Reads per Cell,0
UMIs in Cells,0
Mean UMI per Cell,0
Median UMI per Cell,0
Mean GeneFull per Cell,0
Median GeneFull per Cell,0
Total GeneFull Detected,0

This is the summary output from CellRanger:

Estimated Number of Cells,Mean Reads per Cell,Median Genes per Cell,Number of Reads,Valid Barcodes,Sequencing Saturation,Q30 Bases in Barcode,Q30 Bases in RNA Read,Q30 Bases in UMI,Reads Mapped to Genome,Reads Mapped Confidently to Genome,Reads Mapped Confidently to Intergenic Regions,Reads Mapped Confidently to Intronic Regions,Reads Mapped Confidently to Exonic Regions,Reads Mapped Confidently to Transcriptome,Reads Mapped Antisense to Gene,Fraction Reads in Cells,Total Genes Detected,Median UMI Counts per Cell
"22,866","5,744",307,"131,343,810",97.7%,51.6%,95.2%,90.6%,95.0%,92.5%,85.4%,4.7%,55.2%,25.5%,22.8%,1.1%,78.1%,"23,815",397

From what I gather, it seems that STARsolo is recognizing the barcodes and UMIs but is not assigning reads to the genes. I can't figure out what is going on...

STARsolo 10X scRNAseq • 2.0k views
ADD COMMENT
0
Entering edit mode
2.2 years ago
predeus ★ 1.9k

We run STARsolo on all our 10x samples routinely - you can take a look at the scripts here: https://github.com/cellgeni/STARsolo/

It's hard to compare the exact command, but one thing that I see is missing is --soloStrand Forward

Another thing that could be an issue is output options; you can try using --soloFeatures Gene GeneFull --soloOutFileNames output/ features.tsv barcodes.tsv matrix.mtx and see if that changes things.

Cheers

-- Alex

ADD COMMENT

Login before adding your answer.

Traffic: 2909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6