Question: Estimating required memory for WGCNA analysis
gravatar for klkeysb
8 weeks ago by
klkeysb0 wrote:

We are running WGCNA on ~90,000 genes in a single block with 48 threads and 192GB of memory using the blockwiseModules function.

WGCNA takes several dozen hours to compute the topological overlap matrix. We thought that 192GB would be sufficient for the analysis. But WGCNA chokes when exporting the TOM.

How can we estimate the memory required for blockwiseModules to complete successfully? We have included the output below:

 ..Working on block 1 .
    TOM calculation: adjacency..
    ..will use 48 parallel threads.
     Fraction of slow calculations: 0.000000
    ..matrix multiplication (system BLAS)..
   ..saving TOM for block 1 into file output/100000/wgcna/TOM-block.1.RData
Error in fastcluster::hclust(as.dist(dissTom), method = "average") :
  Memory overflow.
Calls: blockwiseModules -> <Anonymous>
Execution halted
coexpression rna-seq wgcna R • 204 views
ADD COMMENTlink modified 8 weeks ago by Kevin Blighe31k • written 8 weeks ago by klkeysb0
gravatar for Kevin Blighe
8 weeks ago by
Kevin Blighe31k
Republic of Ireland
Kevin Blighe31k wrote:

I'm not so sure that 192GB is sufficient for a dataset of that size. Even if it were sufficient for just generating the correlation matrix, it leaves little room for other operations.

I think that you should request >200GB.

Take a look at this article: “Blockwise” network analysis of large data


ADD COMMENTlink written 8 weeks ago by Kevin Blighe31k

I have seen that article before. The operative line is here:

16 GB memory should be able to handle up to about 24,000 nodes; 32 GB should be enough (perhaps barely so) for 40,000 and so on.

By that calculation, 80k transcripts require 64GB of memory. Imagine our surprise when moving to ~90k transcripts suddenly overloads 192GB.

These heuristics don't seem reliable. Is there a better way to guess at the required memory? This would inform the choice of node type that we choose before running WGCNA.

ADD REPLYlink written 8 weeks ago by klkeysb0

I believe that the memory used will be system-dependent, and also dependent on your version of R (its under constant development behind the scenes). You may consider trying to reduce your dataset by, for example:

  • eliminating genes with low variance
  • eliminating genes with nil or low expression
  • eliminating certain classes of genes (like pseudogenes, if they are in your dataset)

Finally, you may try the Bioconductor support site ( ), where the WGCNA developer is more active.

As I think about it, technically, one could write the correlation matrix to disk as the calculations are under way, and, in this way, save on memory when this [the correlation matrix] is being produced. You would then later just have to read this matrix back into your R session after, but the max memory required would be less. I've worked on ways around these issues, including memory and CPU usage in R (see R functions edited for parallel processing and ). I also have my own network analysis protocol ( Network plot from expression data in R using igraph ), but it's nowhere near as comprehensive as WGCNA yet.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Kevin Blighe31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1973 users visited in the last hour