Is my WGCNA soft threshold (power) too low?
1
0
Entering edit mode
5 months ago
jfaberha ▴ 10

Hi everyone. I'm running WGCNA on multiple expression datasets representing 3 different brain regions with roughly 40 samples per set and I'm having difficulty understanding the parameter selection prior to network construction, specifically when choosing the power. If I'm interpreting the Scale Free Analysis plots correctly, it seems I should choose a power of 3 since that is the first value to cross 0.9 for the model fit index and is the rough inflection point on the mean connectivity plot.

After running the analysis using 3 for the power parameter, I get around 20 modules with the 1st (turquoise) module containing roughly half the genes in my dataset. Maybe this is okay, but my intuition is telling me maybe choosing a power so low is leading to overclustering with some of the larger clusters potentially being biologically meaningless. I'm hesitant to change it, however, given the results of the plots above.

I've read the tutorials and a ton of help threads online, but I haven't seen any example plots or help suggestions that recommend a power as low as 3. Is there a minimum power that I shouldn't go below or should I just trust the QC plots and go with 3?

EDIT: For more clarity, we would like to construct a signed network.

wgcna RNA-Seq clustering network • 557 views
1
Entering edit mode
5 months ago

A soft thresholding power of 3 is quite low and is often caused by the presence of very strong driver of variation (maybe sex?). I would recommend doing a PCA to understand what is going on.

0
Entering edit mode

Brain region is the largest driver of variation in the dataset along PC1, which is why we uploaded the 3 expression datasets separately. Within each brain region, however, we have both sick and healthy animals which cluster strongly along PC2.

Can you recommend some sort of workaround for this? Is there some sort of secondary criteria we can use to justify raising the power or should we further subset our data prior to running WGCNA, i.e. run WGCNA 3 different times with sick and healthy animals separated? I'd prefer the former option for ease of interpretation and comparison of modules between brain regions, but not if it can't be justified by our experimental design.

1
Entering edit mode

Is there some sort of secondary criteria we can use to justify raising the power

By default WGCNA build an unsigned network. You could try with a signed network instead, which should reach a scale free topology at higher power. To build a signed network you should set the arguments networkType = "signed" in pickSoftThreshold; type = "signed" in adjacency; TOMType = "signed" in TOMsimilarity

0
Entering edit mode

Ah, thank you. I forgot to put it in the original post but we are attempting to build a "signed" network, and I didn't realize that we needed to add that parameter to the "pickSoftThreshold" step as well. As you suggested, that seems to increase the optimal soft threshold. https://ibb.co/K5f6Kp3

Last question, given these new plots there seems to be more variation between expression datasets in terms of when they first cross 0.9 for the model fit statistic. Does is appear appropriate to pick 6 as a soft threshold since that is when all three datasets cross the 0.9 barrier? It also looks like 5 might the inflection point in the remaining plots.

1
Entering edit mode

I would pick 9 only because the mean connectivity at 6 is still too high. When you choose the soft threshold the mean connectivity should be around 100 or below (see the WGCNA faq)