I've been stumped with how to work with large (>1 million cell) datasets in Seurat or monocle3, both of which first convert their expression matrices into sparse matrices.

I'm currently working with a 14693 x 1093036 (gene x cell) matrix containing 3744232095 (>3.7 billion) nonzero values. I am finding that reading the matrix into R as a regular matrix works fine, but converting it into sparse format with `Matrix::Matrix(x,sparse=TRUE)`

fails with the error "Error: cannot allocate vector of size 119.7 Gb".

I next tried to convert this to sparse format by writing smaller pieces of the matrix to the hard disk in MatrixMarket (.mtx) format, combining them all outside of R (adjusting row indexes as necessary and writing a header), and then reading it back in with `readMM('matrix.mtx')`

. The resulting sparse matrix works well (it can load into python with `scipy.io.mmread()`

), but fails to import into R with `Matrix::readMM()`

. Now it is giving the error:

Error in validityMethod(as(object, superClass)) : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535

I've tried running this on our university HPC with 2 Tb of memory and tried maximizing the vector heap minimum (`--min-vsize`

) and I still get these errors. Am I hitting the limits for vector storage in R? I don't see any way of proceeding with workflows in Seurat or monocle3 without getting past this issue of huge matrices. Any help or advice would be appreciated!

How much memory are you allocating yourself from the HPC? The memory error is telling you that it can't allocate that memory on top of what is already being used by R.

I've replicated the memory error even when allocated an entire node with 2 Tb of memory. So the hardware seems to be enough.

Do you mind posting some more information? The size of the matrix with

`print(object.size(mat), units="Gb")`

, and your memory allocation using`free -h`

on the linux command line.Thanks for following up! I just followed your instructions on a 248 Gb node (our larger nodes are not free at the moment).

The object size of the gene x cell matrix in R is 126.6 Gb

And here is the output of

`free -h`