Question: I need help with Starsolo and scRNA-seq
1
gravatar for rafaelsolersanblas
9 weeks ago by
rafaelsolersanblas20 wrote:

Hi! My name is Rafa and I am a beginer in the world of scRNA-seq. I've been looking at workflows like https://scrnaseq-course.cog.sanger.ac.uk/website/index.html or https://broadinstitute.github.io/2019_scWorkshop/index.html#course-overview and I do not understand the creation of the SCE object/Starsolo alignment.

I'm using the https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/neurons_900 dataset for practice as it doesn't take up much memory and to make wait times shorter. I'm analyzing with the "Starsolo" program, using the following code:

STAR --genomeDir /home/victor/Escritorio/Curso_Single_Cell/indices/STAR --runThreadN 16 --readFilesIn neurons_900_fastqs/neurons_900_S1_L001_R2_001.fastq,neurons_900_fastqs/neurons_900_S1_L002_R2_001.fastq neurons_900_fastqs/neurons_900_S1_L001_R1_001.fastq,neurons_900_fastqs/neurons_900_S1_L002_R1_001.fastq --soloType CB_UMI_Simple --soloCBwhitelist /home/victor/Escritorio/Curso_Single_Cell/whitelist/737K-august-2016.txt --outFileNamePrefix results/STAR/

After that, Starsolo return a raw and filtered data, where you can find the matrix, barcodes and genes/features. But when I load this 3 files and create a SCE object, the count of assays are not correct.

> dir.name <- "/home/victor/Escritorio/Curso_Single_Cell/results/STAR/Solo.out/Gene/raw"
> list.filesdir.name)
[1] "barcodes.tsv" "genes.tsv"    "matrix.mtx"  
> sce <- DropletUtils::read10xCountsdir.name, col.names = TRUE)
> sce

class: SingleCellExperiment 
dim: 55487 737280 
metadata(1): Samples
assays(1): counts
rownames(55487): ENSMUSG00000102693 ENSMUSG00000064842 ... ENSMUSG00000096730 ENSMUSG00000095742
rowData names(3): ID Symbol NA
colnames(737280): AAACCTGAGAAACCAT AAACCTGAGAAACCGC ... TTTGTCATCTTTAGTC TTTGTCATCTTTCCTC
colData names(2): Sample Barcode
reducedDimNames(0):
spikeNames(0):
altExpNames(0):

> summary(assay(sce, "counts"))
55487 x 737280 sparse Matrix of class "dgCMatrix", with 5113008 entries 
        i   j x
1    2681   1 1
2   26019   1 1
3   30593   1 1
4   30624   1 1
5   30756   1 1
6   36144   1 1
7   38875   1 1
8   53732   1 1
9   46321   3 1
10  55399   5 1
11   4333   6 1
12   7768   6 1
13  10051   6 1
14  15470   6 1
15  25255   6 1
16  32249   6 1
17  33914   6 1
18  37100   6 1
19  40026   6 1
20  40180   6 1
21  41019   6 1
22  49661   6 1
23  49669   6 1
24  18081   7 1
25  16776   9 1
26  54018  11 1
27    272  12 1
28   9832  12 1
29  13560  12 1
30  14856  12 1
31  15490  12 1
32  18592  12 1
33  23950  12 1
34  25910  12 1
35  28138  12 1
36  28177  12 1
37  35881  12 1
38  36144  12 1
39  36692  12 1
40  37663  12 1
41  38459  12 1
42  39978  12 1
43  40156  12 1
44  41019  12 1
45  41030  12 1
46  43773  12 1
47  46411  12 2
48  48427  12 1
49  49388  12 1
50  49409  12 1
51  49414  12 2
52  50650  12 1
53  33914  14 1
 ... etc

I don't know why is happening this. Maybe it could be because I need to count the reads per gene? I thought that Starsolo perform the mapping but also the counting. If it this the reason, what should I do?

Thanks a lot!! :)

rna-seq star starsolo scrna-seq • 170 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by rafaelsolersanblas20

And which rownames should be ??

> assay(sce, "counts")
55487 x 992 sparse Matrix of class "dgCMatrix"
   [[ suppressing 77 column names ‘AAACCTGGTCTCGTTC’, ‘AAACGGGAGCCACGTC’, ‘AAACGGGAGCGAGAAA’ ... ]]
   [[ suppressing 77 column names ‘AAACCTGGTCTCGTTC’, ‘AAACGGGAGCCACGTC’, ‘AAACGGGAGCGAGAAA’ ... ]]

ENSMUSG00000102693 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000064842 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000051951 1 . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000102851 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000103377 . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . ......
ENSMUSG00000104017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......

 ..............................
 ........suppressing 915 columns and 55475 rows in show(); maybe adjust 'options(max.print= *, width = *)'
 ..............................
   [[ suppressing 77 column names ‘AAACCTGGTCTCGTTC’, ‘AAACGGGAGCCACGTC’, ‘AAACGGGAGCGAGAAA’ ... ]]

ENSMUSG00000095434 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000094431 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000094621 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000098647 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000096730 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000095742 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ADD REPLYlink written 9 weeks ago by rafaelsolersanblas20

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question

ADD REPLYlink written 9 weeks ago by genomax92k
1
gravatar for pacome.pr
9 weeks ago by
pacome.pr70
pacome.pr70 wrote:

Hello Rafa,

I am not an expert on STARsolo, but looking at the rownames of your SingleCellExperiment, it seems that reads were counted on the mouse exome:

rownames(55487): ENSMUSG00000102693 ENSMUSG00000064842 ... ENSMUSG00000096730 ENSMUSG00000095742

I am not sure why the matrix is under this form, maybe it is the summary function ?

55487 x 737280 sparse Matrix of class "dgCMatrix", with 5113008 entries 
        i   j x
1    2681   1 1
2   26019   1 1
3   30593   1 1
4   30624   1 1
5   30756   1 1

This is the sparse representation of your matrix, e.g. the matrix indexes and values of non-zeroes entries. For example, in row 2681, column 1, the value is 0.
What happens if you run :

head(assay(sce, "counts"))

?
If it is not under it's sparse matrix (dgCMatrix) representation, see ?Matrix::sparseMatrix in order to create the matrix from non-zeroes entries.

ADD COMMENTlink written 9 weeks ago by pacome.pr70
1
gravatar for ATpoint
9 weeks ago by
ATpoint41k
Germany
ATpoint41k wrote:

It seems to me that you read the entire set of barcodes from the 737k list into your sce. I am not a STARsolo, neither CellRanger (Alevin for the win ;-) ) user but maybe you selected the wrong directory? The row number looks fine, but 737k columns must be wrong. You selected folder raw, is there a second folder or so, something like filtered where the empty barcodes got eliminated?

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by ATpoint41k

I have used the filtered, and now I have the correct number of cells! Thanks :) But I still have a number of counts in the assay to small.

> sce
class: SingleCellExperiment 
dim: 55487 992 
metadata(1): Samples
assays(1): counts
rownames(55487): ENSMUSG00000102693 ENSMUSG00000064842 ... ENSMUSG00000096730 ENSMUSG00000095742
rowData names(3): ID Symbol NA
colnames(992): AAACCTGGTCTCGTTC AAACGGGAGCCACGTC ... TTTGGTTTCATGCATG TTTGTCACATCGGTTA
colData names(2): Sample Barcode
reducedDimNames(0):
spikeNames(0):
altExpNames(0):

> assay(sce, "counts")
55487 x 992 sparse Matrix of class "dgCMatrix"
   [[ suppressing 77 column names ‘AAACCTGGTCTCGTTC’, ‘AAACGGGAGCCACGTC’, ‘AAACGGGAGCGAGAAA’ ... ]]
   [[ suppressing 77 column names ‘AAACCTGGTCTCGTTC’, ‘AAACGGGAGCCACGTC’, ‘AAACGGGAGCGAGAAA’ ... ]]

ENSMUSG00000102693 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000064842 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000051951 1 . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000102851 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ENSMUSG00000103377 . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . ......
ENSMUSG00000104017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
ADD REPLYlink written 9 weeks ago by rafaelsolersanblas20
1

Not sure what you mean. Do you mean these dots? This is the way this sparse matrix format (dgCMatrix) represents data. Nothing to worry about, it is a kind of compression. You should be good to go.

ADD REPLYlink written 9 weeks ago by ATpoint41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2006 users visited in the last hour