Hi all,
Sorry if this has been asked multiple times before - I’m wondering what’s possible in the current landscape of bioinformatics tools.
I’m planning on sequencing a set of 200,000 cells or so using 10X’s fixed RNA profiling technology. I’d like to at least try to analyze the data myself. My main device has 16 cores and 64gb RAM - wondering how doable that is, or if I need to look into other solutions. I’m not a fan of HPC because I don’t like submitting jobs into a queue…I prefer getting results back in real time if possible and tracking progress.
Any thoughts? Feel free to tell me this plan makes no sense!
It's likely too much to fit all into memory depending on type of analysis, so I would look into on-disk solutions (HDF5 for example). See a good read at http://bioconductor.org/books/3.16/OSCA.advanced/dealing-with-big-data.html#out-of-memory-representations
Thank you - really helpful! Interestingly it says that 1.3 million cells would require about 30gb RAM as a sparse matrix - that actually seems surprisingly good, and if so bodes well for the 200k cells i plan to analyze.
That's just keeping the plain matrix in memory, wothout any analysis done. You will have that, and the normalized counts, plus reduced dimensions, plus the memory required by the tools to run all that. So effectively it will me >>> more
Let's say you are using
Seurat
, you should be fine for a standard analysis but it is very unlikely you will be able to run Seurat's SCTransform... you would need at least 200 GB+ probably much more. You might even struggle to do the basic merging/integration with your RAM limitations.