computing resources for scRNA-seq analysis
0
0
Entering edit mode
4.2 years ago
Bogdan ★ 1.4k

Dear all,

i would like for your suggestions please about the computing resources needed for scRNA-seq analysis running on 12 (or more) samples (each sample has 5000 - 6000 cells); we could use Seurat, Liger, Harmony, or SimpleSingleCell pipelines ;

what would be the minimum RAM needed (having 64GB is not sufficient) ;

and would you recommend to do all the data processing on Google Cloud, AWS, or on any other platform ?

thank you !

bogdan

R sequencing scRNA-seq • 4.0k views
ADD COMMENT
0
Entering edit mode

having 64GB is not sufficient

I have run more cells than that with less than 64GB, so that part may be hard to predict based on the number of cells alone.

ADD REPLY
0
Entering edit mode

tthank you very much, Igor, for sharing your experience in the scRNAseq pipelines :)

ADD REPLY
0
Entering edit mode

Are you talking about the initial preprocessing (like 10X's CellRanger) or downstream analysis? Which tool did you use that didn't like the 64G limit? I know that CellRanger by default tries to use 90% of all memory on the machine and therefore causes troubles; you have to explicitly specify the amount of memory it can use. In my experience, most single cell pipelines do not need that much memory, but you have to be careful explicitly specifying it, as in the example with Cell Ranger

ADD REPLY
0
Entering edit mode

yes, thank you for asking for more details. we typically use CellRanger on a SLURM cluster, and we have lots of resources there.

after we obtain the matrices of counts for all the samples, in order to prototype the scRNAseq pipeline, i have been using my Ubuntu station (that has 64GB RAM). The pipeline prototype consists in Seurat 3, and Conos, Liger, Harmony for batch corrections (according to : https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9). thanks :)

ADD REPLY
0
Entering edit mode

This is difficult to say since it highly depends on the application. I analyzed my four 10X samples (5000-7000) cells all on a Macbook Pro with 16GB RAM and had no problems. Indeed if you have a lot of samples you might need more RAM since the complexity of some algorithms might scale quadratically or even cubic. At which step did you run out of memory?

ADD REPLY
0
Entering edit mode

thank you for sharing your experience, i remember the step :

samples.combined.list.anchors <- FindIntegrationAnchors(object.list = samples.combined.list, dims = 1:20)

samples.combined.list.combined <- IntegrateData(anchorset = samples.combined.list.anchors, dims = 1:20)

when the computer signaled the run out of the memory, although i will re-run during this coming week, and will let you know.

ADD REPLY
0
Entering edit mode

just a note to add, the pipeline suggested by the authors :

https://satijalab.org/seurat/v3.1/integration.html

using a reference-based approach seems to be working well. To quote :

"we present an additional modification to the Seurat integration workflow, which we refer to as ‘Reference-based’ integration. In the previous workflows, we identify anchors between all pairs of datasets. While this gives datasets equal weight in downstream integration, it can also become computationally intensive. For example, when integrating 10 different datasets, we perform 45 different pairwise comparisons.

As an alternative, we introduce here the possibility of specifying one or more of the datasets as the ‘reference’ for integrated analysis, with the remainder designated as ‘query’ datasets. In this workflow, we do not identify anchors between pairs of query datasets, reducing the number of comparisons. For example, when integrating 10 datasets with one specified as a reference, we perform only 9 comparisons."

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6