Question

How to solve a possible memory issues with Seurat and IntegrateData step?

0

Entering edit mode

14 months ago

Vanish007 ▴ 40

Hi All,

Newbie here - I am currently running Seurat on an RStudio server that has 3TB of RAM, 4 Intel Xeon CPUs with 24 cores. I am running 53 samples.

When I run the IntegrateData step, I keep receiving the following error:

Integrating data
Merging dataset 51 53 49 52 50 45 47 46 48 44 54 42 11 36 26 25 10 33 14 12 24 into 3 4 23 21 41 29 1 30 9 20 13 
15 2 43 5 34 18 17 31 35 19 7 32 28 6 22 27 8 16 58 61 56 57 59
Extracting anchors for merged samples
Finding integration vectors
Error in subCsp_rows(x, i, drop = drop) : 
Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89

I am currently running the following:

Immune.features <- SelectIntegrationFeatures(object.list = sample_all_v2, nfeatures = 3000)
options(future.globals.maxSize = 4800 * 1024^2)

Immune.list <- PrepSCTIntegration(object.list = sample_all_v2, anchor.features = Immune.features, 
                              verbose = TRUE)
pbmc.anchors <- FindIntegrationAnchors(object.list =Immune.list, normalization.method = "SCT", 
                                   anchor.features = Immune.features)
pbmc.integrated <- IntegrateData(anchorset = pbmc.anchors, normalization.method = "SCT")

According to this issue from the Seurat github, downsampling is recommended - which I performed as follows:

pbmc <- subset(pbmc, subset = nFeature_RNA > 200)
pbmc.list <- SplitObject(pbmc, split.by = "Method")
for (i in names(pbmc.list)) {
pbmc.list[[i]] <- SCTransform(pbmc.list[[i]], verbose = TRUE)
}
pbmc.features <- SelectIntegrationFeatures(object.list = pbmc.list, nfeatures = 3000)
pbmc.list <- PrepSCTIntegration(object.list = pbmc.list, anchor.features = pbmc.features)

table(Idents(pbmc.list[[9]])) #pbmc2 3327
#Downsampling
pbmc.list_v2 <- lapply(X = pbmc.list, 
   FUN = subset, 
   downsample = 1000)

table(Idents(pbmc.list_v2[[8]])) #pbmc1 1000 pbmc2 1000

#Downsampled
sample_all_v2 <- lapply(X = sample_all, 
                    FUN = subset, 
                    downsample = 1000)
table(Idents(sample_all_v2[[53]])) #pbmc1 253 #pbmc2 273

However I still am getting the aforementioned error. Is there any way to solve this issue?

Thank you!

R Seurat scRNAseq IntegrateData • 2.0k views

ADD COMMENT • link 14 months ago by Vanish007 ▴ 40

score 0 · Answer 1 · 2023-02-21

0

Entering edit mode

14 months ago

fracarb8 ★ 1.6k

I would advise against downsampling, as you might lose some important data. For big datasets like yours, you should use the reference base intergation workflow. The idea behind is that you first integrate a subset of your samples (e.g. only controls, 1 sample x experiment, ...) and then you "align" all the other sample to this reference dataset.

ADD COMMENT • link 14 months ago by fracarb8 ★ 1.6k

0

Entering edit mode

Thank you for the recommendation, fracar8! Is there a way to check what samples the reference is alluding to from the list when it specifies, for example, "reference = c(1,2)"? Thanks

ADD REPLY • link 14 months ago by Vanish007 ▴ 40

0

Entering edit mode

You are storing the samples inside a list. c(1,2) means the first and the second elements of the list. What I usually do is to add the sample name to the list and then pass the reference as c("healthy-control2","control1","control2")

ADD REPLY • link 14 months ago by fracarb8 ★ 1.6k

0

Entering edit mode

Got it, thanks! For some reason I had a lapse in thinking and forgot it was referencing the samples in the order I placed them. Appreciate the help and patience!

ADD REPLY • link 14 months ago by Vanish007 ▴ 40