Question: Estimating required memory for WGCNA analysis
0
gravatar for klkeysb
2.4 years ago by
klkeysb10
klkeysb10 wrote:

We are running WGCNA on ~90,000 genes in a single block with 48 threads and 192GB of memory using the blockwiseModules function.

WGCNA takes several dozen hours to compute the topological overlap matrix. We thought that 192GB would be sufficient for the analysis. But WGCNA chokes when exporting the TOM.

How can we estimate the memory required for blockwiseModules to complete successfully? We have included the output below:

 ..Working on block 1 .
    TOM calculation: adjacency..
    ..will use 48 parallel threads.
     Fraction of slow calculations: 0.000000
    ..connectivity..
    ..matrix multiplication (system BLAS)..
    ..normalization..
    ..done.
   ..saving TOM for block 1 into file output/100000/wgcna/TOM-block.1.RData
 ....clustering..
Error in fastcluster::hclust(as.dist(dissTom), method = "average") :
  Memory overflow.
Calls: blockwiseModules -> <Anonymous>
Execution halted
coexpression rna-seq wgcna R • 1.9k views
ADD COMMENTlink modified 2.4 years ago by Kevin Blighe70k • written 2.4 years ago by klkeysb10
1
gravatar for Kevin Blighe
2.4 years ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:

I'm not so sure that 192GB is sufficient for a dataset of that size. Even if it were sufficient for just generating the correlation matrix, it leaves little room for other operations.

I think that you should request >200GB.

Take a look at this article: “Blockwise” network analysis of large data

Kevin

ADD COMMENTlink written 2.4 years ago by Kevin Blighe70k

I have seen that article before. The operative line is here:

16 GB memory should be able to handle up to about 24,000 nodes; 32 GB should be enough (perhaps barely so) for 40,000 and so on.

By that calculation, 80k transcripts require 64GB of memory. Imagine our surprise when moving to ~90k transcripts suddenly overloads 192GB.

These heuristics don't seem reliable. Is there a better way to guess at the required memory? This would inform the choice of node type that we choose before running WGCNA.

ADD REPLYlink written 2.4 years ago by klkeysb10
1

I believe the memory used will be system-dependent, and also dependent on your version of R (its under constant development behind the scenes). You may consider trying to reduce your dataset by, for example:

  • eliminating genes with low variance
  • eliminating genes with nil or low expression
  • eliminating certain classes of genes (like pseudogenes, if they are in your dataset)

Finally, you may try the Bioconductor support site ( https://support.bioconductor.org/t/Latest/ ), where the WGCNA developer is more active.

As I think about it, technically, one could write the correlation matrix to disk as the calculations are under way, and, in this way, save on memory when this [the correlation matrix] is being produced. You would then later just have to read this matrix back into your R session after, but the max memory required would be less.

Edit 29th August 2019: in fact, I have learned that this (what I wrote above) is precisely how bigcor does it

ADD REPLYlink modified 18 months ago • written 2.4 years ago by Kevin Blighe70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2568 users visited in the last hour
_