Question: Wgcna Network Construction Issues & Best Practices?
gravatar for Adam Cornwell
8.5 years ago by
Adam Cornwell430
United States
Adam Cornwell430 wrote:

I have a microarray dataset which I was hoping to be able to process using a gene network construction algorithm, most notably WGCNA. I am having trouble determining if my current dataset is appropriate for network construction.

I have tried a number of different subsets of probes, samples, and also tried to use collapseRows, but I'm finding that the powers I would need to select in order to achieve a Scale Free Topology Model Fit Index of near 0.9 are extremely high- usually a soft-thresholding power of 25 or greater. Comparatively, in the WGCNA tutorials and other material I've seen, common powers are between 6 and 10.

I know that if the Model fit index isn't high, the network won't approximate a scale-free topology and the connectivity will be too high to be useful. However, I haven't figured out what factors in the dataset would be contributing to this. Admittedly, my sample size is small- only 11 samples. However, I didn't see any recommendations for determining minimum sample size, nor any way to calculate that. Does anyone have any sorts of 'best practices' regarding this for WGCNA? Should I go ahead and run through the rest of the WGCNA workflow even if I need to select a power of 30 or so to get a Topology Model Fit Index near the suggested 0.9?

We have a number of datasets we'd like to apply this to, but I'm getting concerned now because we usually only have three biological replicates, and typically only a few conditions to test. If this isn't going to work, I'll need to find a similar method that is more robust to smaller sample sizes, even if it's less effective overall compared to WGCNA.

Thank you!

R network expression microarray • 7.7k views
ADD COMMENTlink modified 8.5 years ago by Olga0 • written 8.5 years ago by Adam Cornwell430

Adam, AFAIK there are no "best practices" (yet) for this algorithm. The WGCNA documentation encourage you to basically tailor your soft-thresholding power to your data -- but with no guidelines that is not very helpful, I agree.

What happens when you take your data through the workflow? Do you get any module identification? How does module membership vary with different powers? That may be helpful -- you may identify a point at which module membership does not change with changes in power -- and that would be your answer.

In fact, since there are no published best practices, running your data with different powers could be helpful in establishing those practices.

As for the minimum number of samples you mention below -- the minimum n of 12 suggests there may be something in the algorithm that can be modified for smaller data sets, with the caveat that module identification may not be as robust. Just some thoughts.

ADD REPLYlink written 8.4 years ago by Alex Paciorkowski3.4k
gravatar for Olga
8.5 years ago by
Olga0 wrote:

The same problem

ADD COMMENTlink written 8.5 years ago by Olga0

Since there haven't been any other answers- I asked someone who's had some correspondence with the authors of the package, and the suggested minimum number of samples is apparently 12. I have no other sort of backup or reasoning for this, but it sounds like a reasonable enough value.

ADD REPLYlink written 8.4 years ago by Adam Cornwell430
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1041 users visited in the last hour