Question

Normalizing using Seurat with large numbers of samples

0

Entering edit mode

6.5 years ago

Kristin Muench ▴ 640

Hello,

I am trying to run Seurat on a fairly large scRNA-Seq experiment, with 16 samples ranging from 1000-10,000 cells.

In my first run of the pipeline, I merged all of the samples into a single Seurat object, like so:

data.combined <- MergeSeurat(object1 = J, object2 = E, add.cell.id1 = "J", 
    add.cell.id2 = "E", project = "all")
data.combined <- AddSamples(object = data.combined, new.data = F.data, add.cell.id = "F")

...and then followed the tutorial. However, on the scaling step:

data.combined <- ScaleData(object = data.combined, vars.to.regress = c("nUMI"))

I get an error:

Error: vector memory exhausted (limit reached?)

I see that this is associated with running out of RAM with which to do the computation, which isn't surprising given the size of data.combined. How can I overcome this, short of finding a computational cluster to run this on? Due to the large differences in the number of UMIs between the 1000 and 10,000 cells samples, it seems really crucial to run this step on a Seurat object containing all the data, rather than hack together a solution where I run ScaleData on subsets of data and then tack them all together afterwards..

Thank you for your help! I am working on an iMac with 16 GB of RAM. sessionInfo() is:

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2          gdtools_0.1.7               biomaRt_2.36.1              ggrepel_0.8.0               edgeR_3.22.5               
 [6] limma_3.36.5                readr_1.1.1                 DESeqAid_0.2                DESeq2_1.20.0               SummarizedExperiment_1.10.1
[11] DelayedArray_0.6.6          BiocParallel_1.14.2         matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.32.7       
[16] GenomeInfoDb_1.16.0         IRanges_2.14.12             S4Vectors_0.18.3            BiocGenerics_0.26.0         bindrcpp_0.2.2             
[21] dplyr_0.7.6                 Seurat_2.3.4                Matrix_1.2-14               cowplot_0.9.4               ggplot2_3.0.0              

loaded via a namespace (and not attached):
  [1] snow_0.4-3             backports_1.1.2        Hmisc_4.1-1            plyr_1.8.4             igraph_1.2.2           lazyeval_0.2.1        
  [7] splines_3.5.1          digest_0.6.18          foreach_1.4.4          htmltools_0.3.6        lars_1.2               gdata_2.18.0          
 [13] magrittr_1.5           checkmate_1.8.5        memoise_1.1.0          cluster_2.0.7-1        mixtools_1.1.0         ROCR_1.0-7            
 [19] annotate_1.58.0        R.utils_2.7.0          svglite_1.2.1          prettyunits_1.0.2      colorspace_1.3-2       blob_1.1.1            
 [25] crayon_1.3.4           RCurl_1.95-4.11        jsonlite_1.5           genefilter_1.62.0      bindr_0.1.1            survival_2.42-6       
 [31] zoo_1.8-4              iterators_1.0.10       ape_5.2                glue_1.3.0             gtable_0.2.0           zlibbioc_1.26.0       
 [37] XVector_0.20.0         kernlab_0.9-27         prabclus_2.2-6         DEoptimR_1.0-8         scales_1.0.0           mvtnorm_1.0-8         
 [43] DBI_1.0.0              bibtex_0.4.2           Rcpp_0.12.19           metap_1.0              dtw_1.20-1             progress_1.2.0        
 [49] xtable_1.8-3           htmlTable_1.12         reticulate_1.10        foreign_0.8-71         bit_1.1-14             proxy_0.4-22          
 [55] mclust_5.4.2           SDMTools_1.1-221       Formula_1.2-3          tsne_0.1-3             htmlwidgets_1.3        httr_1.3.1            
 [61] gplots_3.0.1           fpc_2.1-11.1           acepack_1.4.1          modeltools_0.2-22      ica_1.0-2              pkgconfig_2.0.2       
 [67] XML_3.98-1.16          R.methodsS3_1.7.1      flexmix_2.3-14         nnet_7.3-12            locfit_1.5-9.1         tidyselect_0.2.5      
 [73] labeling_0.3           rlang_0.2.2            reshape2_1.4.3         AnnotationDbi_1.42.1   munsell_0.5.0          tools_3.5.1           
 [79] RSQLite_2.1.1          ggridges_0.5.1         evaluate_0.12          stringr_1.3.1          yaml_2.2.0             npsurv_0.4-0          
 [85] knitr_1.20             bit64_0.9-7            fitdistrplus_1.0-11    robustbase_0.93-3      caTools_1.17.1.1       purrr_0.2.5           
 [91] RANN_2.6.1             pbapply_1.3-4          nlme_3.1-137           R.oo_1.22.0            hdf5r_1.0.1            compiler_3.5.1        
 [97] rstudioapi_0.8         curl_3.2               png_0.1-7              lsei_1.2-0             statmod_1.4.30         tibble_1.4.2          
[103] geneplotter_1.58.0     stringi_1.2.4          lattice_0.20-35        trimcluster_0.1-2.1    pillar_1.3.0           Rdpack_0.10-1         
[109] lmtest_0.9-36          data.table_1.11.8      bitops_1.0-6           irlba_2.3.2            gbRd_0.4-11            R6_2.3.0              
[115] latticeExtra_0.6-28    KernSmooth_2.23-15     gridExtra_2.3          codetools_0.2-15       MASS_7.3-50            gtools_3.8.1          
[121] assertthat_0.2.0       rprojroot_1.3-2        withr_2.1.2            GenomeInfoDbData_1.1.0 hms_0.4.2              diptest_0.75-7        
[127] doSNOW_1.0.16          grid_3.5.1             rpart_4.1-13           tidyr_0.8.2            class_7.3-14           rmarkdown_1.10        
[133] segmented_0.5-3.0      Rtsne_0.15             base64enc_0.1-3

single cell scRNA-Seq R • 4.9k views

ADD COMMENT • link 6.2 years ago by Kristin Muench ▴ 640

score 1 · Accepted Answer · 2019-04-08

Update from the future: I did end up running these scripts on a computational cluster using a job scheduler. Once I was able to run these scripts with between 64-128 GB of RAM, I no longer received this issue, and the scripts ran as expected.

To run an R script via command line instead of via RStudio, I use RScript: https://support.rstudio.com/hc/en-us/articles/218012917-How-to-run-R-scripts-from-the-command-line