Question

scSeq millions of cells

2

Entering edit mode

2.2 years ago

ecl75 ▴ 20

Hello, We are scSeq at least five million cells. I previously have used Seurat to cluster gene expression data and do preliminary analyses. Is there an upper limit to the amount of cells you can analyze with Seurat? Are there any other packages that people recommend instead? Thank you

Seurat scRNA R • 1.4k views

ADD COMMENT • link updated 2.2 years ago by Pratik ★ 1.0k • written 2.2 years ago by ecl75 ▴ 20

1

Entering edit mode

It will most likely not be possible to load these data into memory, so this will require some on-disk representation, see e.g. http://bioconductor.org/books/3.14/OSCA.advanced/dealing-with-big-data.html#out-of-memory-representations

ADD REPLY • link 2.2 years ago by ATpoint 81k

0

Entering edit mode

hello, can you quickly clarify that it most likely be possible to load these data into memory. Are you talking about actually loading it into R. We have a HPC that can handle this amount of data. If that is not what you are talking about can you explain some more. Thank you

ADD REPLY • link 2.2 years ago by ecl75 ▴ 20

score 3 · Answer 1 · 2022-01-25

Hey, I think the Theis Lab's Scanpy may/should be able to work with 5 million cells. To provide context, the human cell atlas (HCA) folks, to my understanding, use the "Theis-universe" packages for their analyses. Some of the groups who participate in the HCA work have 1-4 million cells from scRNA-seq submitted.

"But wait! There's more!"

You can also "plug in" your data directly into cellxgene after you've annotated it and all through Scanpy, and run a web-based interactive scRNA-seq "thing" for people to look at.

You can take a look at some of these datasets from the HCA here (try out interactive exploration in cellxgene) : https://cellxgene.cziscience.com/

From skimming through the various research groups' datasets in the cellxgene "catalog" homepage linked above, I saw one group who did 4 million cells. It looks as though, the limit for cellxgene (interactive web-based exploration of your annotated data) is 2 million cells. However that doesn't mean you can't complete your analyses on Scanpy on your HPC, and then plug a subset of your 5 million cells (1 million cells) into cellxgene. This is what that 4 million cell group did (used a 1 mill cell subset). I would just maybe drop a message on https://github.com/theislab/scanpy/issues or just do some searching if 5 million cells is doable, I'm sure it is, since 4 million is possible but you never know.

I have had difficulties using Seurat through RStudio with larger datasets, which Scanpy through jupyter-lab/notebook had no problem with.

Credits go-to rpolicastro for helping me find (a while back) what the HCA/cellxgene folks were using for their data analysis. It seemed very "black-box" then. Now, not so much.

Also, what ATpoint suggested could be worth looking into (checking out OSCA). There is a tremendous amount of community support, as well as versatility and flexibility Bioconductor offers (that I, personally, have not delved too deep into because there is soo much you can do through the Bioconductor packages available/being developed). I think if you really want to become a specialist in scRNA-seq, this could be a route for you.