I am having large memory usage issue when running DESeq2 on our server with 256 GB RAM.
I have rather a small counts table, 8 samples with 23K genes, but DESeq2 still uses lots of lots of memory choking my server to death, using swap memory, and finally I have to kill the process manually to bring the server alive.
I have read other related topics but they didn't help my problem, as the problems there are due to huge data size or massive parallelization, which do not apply to my case.
My code:
# create comparison groups and set reference level
samples_anno %<>% mutate(group = relevel(as.factor(diffExpGroup), ref = "Group2"))
# create DESeq2 object with the counts table and the sample annotations (8 samples by 23K genes)
dds = DESeqDataSetFromMatrix(countData = counts_table, colData = samples_anno, design = ~ group)
# keep only genes that have at least 4 * n_samples reads in total across all the samples (reduce dds object size)
dds = dds[rowSums(counts(dds)) > (2 * nrow(samples_anno)), ]
# clean environment before calling DESeq to reduce memory usage (those generated beforehand and not needed anymore)
rm(gene_exp_files, gene_symbols, file_names, counts_table)
gc()
# create DESeq object
dds = DESeq(object = dds)
The last step uses >200GB memory at the step of fitting model and testing
.
Here is the screenshot of the process using DESeq2:
What could be the problem? How can I reduce the memory usage of DESeq2?
Thanks.