Question

Wgcna Network Construction Issues & Best Practices?

10

Entering edit mode

11.9 years ago

Adam Cornwell ▴ 510

I have a microarray dataset which I was hoping to be able to process using a gene network construction algorithm, most notably WGCNA. I am having trouble determining if my current dataset is appropriate for network construction.

I have tried a number of different subsets of probes, samples, and also tried to use collapseRows, but I'm finding that the powers I would need to select in order to achieve a Scale Free Topology Model Fit Index of near 0.9 are extremely high- usually a soft-thresholding power of 25 or greater. Comparatively, in the WGCNA tutorials and other material I've seen, common powers are between 6 and 10.

I know that if the Model fit index isn't high, the network won't approximate a scale-free topology and the connectivity will be too high to be useful. However, I haven't figured out what factors in the dataset would be contributing to this. Admittedly, my sample size is small- only 11 samples. However, I didn't see any recommendations for determining minimum sample size, nor any way to calculate that. Does anyone have any sorts of 'best practices' regarding this for WGCNA? Should I go ahead and run through the rest of the WGCNA workflow even if I need to select a power of 30 or so to get a Topology Model Fit Index near the suggested 0.9?

We have a number of datasets we'd like to apply this to, but I'm getting concerned now because we usually only have three biological replicates, and typically only a few conditions to test. If this isn't going to work, I'll need to find a similar method that is more robust to smaller sample sizes, even if it's less effective overall compared to WGCNA.

Thank you!

r microarray expression network • 8.7k views

ADD COMMENT • link updated 11.9 years ago by Olga • 0 • written 11.9 years ago by Adam Cornwell ▴ 510

0

Entering edit mode

Adam, AFAIK there are no "best practices" (yet) for this algorithm. The WGCNA documentation encourage you to basically tailor your soft-thresholding power to your data -- but with no guidelines that is not very helpful, I agree.

What happens when you take your data through the workflow? Do you get any module identification? How does module membership vary with different powers? That may be helpful -- you may identify a point at which module membership does not change with changes in power -- and that would be your answer.

In fact, since there are no published best practices, running your data with different powers could be helpful in establishing those practices.

As for the minimum number of samples you mention below -- the minimum n of 12 suggests there may be something in the algorithm that can be modified for smaller data sets, with the caveat that module identification may not be as robust. Just some thoughts.

ADD REPLY • link 11.8 years ago by Alex Paciorkowski 3.5k

score 0 · Answer 1 · 2012-06-07

0

Entering edit mode

11.9 years ago

Olga • 0

The same problem

ADD COMMENT • link 11.9 years ago by Olga • 0

1

Entering edit mode

Since there haven't been any other answers- I asked someone who's had some correspondence with the authors of the package, and the suggested minimum number of samples is apparently 12. I have no other sort of backup or reasoning for this, but it sounds like a reasonable enough value.

ADD REPLY • link 11.9 years ago by Adam Cornwell ▴ 510