Question

RaceID3 using 10x datasets

0

Entering edit mode

5.6 years ago

Seigfried ▴ 80

Hello I wish to cluster my single cell 10x data using RaceID3. However, I cannot load my 10x data into RaceID using their function SCseq

10x gave me 3 files: 1) barcodes.tsv.gz 2) features.tsv.gz 3) matrix.mtx.gz

I used Seurat's Read10X function :

library(Seurat)
library(RaceID)

pbmc.data <- Read10X(data.dir = "C:/Users/s/Downloads/")

sc <- SCseq(pbmc.data)

Here is my pbmc.data

> pbmc.data
33694 x 27179520 sparse Matrix of class "dgCMatrix"

This is the error i get :

sc <- SCseq(pbmc.data)
Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

I also tried using the Matrix package in R.

library(Matrix)
matrix_dir = "C:/Users/s/Downloads/"
barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz")
features.path <- paste0(matrix_dir, "features.tsv.gz")
matrix.path <- paste0(matrix_dir, "matrix.mtx.gz")
mat <- readMM(file = matrix.path)
feature.names = read.delim(features.path, 
                       header = FALSE,
                       stringsAsFactors = FALSE)
barcode.names = read.delim(barcode.path, 
                       header = FALSE,
                       stringsAsFactors = FALSE)
colnames(mat) = barcode.names$V1
rownames(mat) = feature.names$V1

And it fails to allocate a huge amount of memory

> sc <- SCseq(mat)
Error: cannot allocate vector of size 6823.1 Gb

I understand that RaceID requires a sparse matrix which I am already providing. Can someone please explain?

RaceID single cell 10x • 2.5k views

ADD COMMENT • link 5.6 years ago by Seigfried ▴ 80

score 1 · Accepted Answer · 2019-12-31

1

Entering edit mode

5.6 years ago

Devon Ryan 105k

RaceID is requesting about 7TB RAM to load that dataset, which is pretty much guaranteed to be more than you have. I can tell you from experience that RaceID3 does not currently scale well with 10x-scale data, so in addition to needing a absurd RAM amounts it'll need a LOT of time to run. I recommend switching to something else for this kind of data.

ADD COMMENT • link 5.6 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you for your reply @Devon Ryan

The count matrix I am using is currently a "cellranger aggregate" of 4 different samples. I tried using Seurat for clustering but since my samples are cell culture samples with differing conditions, they do not cluster well.

I could run this on a single sample but then it would defeat the purpose of identifying cell lineages.

Could you please recommend any other tools I can use to effectively do this? Currently trying out Slingshot.

Wishing you a Happy New Year and Decade!

ADD REPLY • link 5.5 years ago by Seigfried ▴ 80

0

Entering edit mode

Play with the parameters in Seurat more, including how you're dealing with batches (i.e., samples). You can also try things like scanorama and scanpy.

ADD REPLY • link 5.5 years ago by Devon Ryan 105k