Hello all,

I have looked through the many existing posts on soft power selection in WGCNA, but unfortunately wasn't able to determine a solution to my problem. In brief, I cannot achieve a signed scale free topology R^2 of 0.8 or higher without having a very high soft power. I am conducting an exploratory analysis of the gene expression data for the skeletal muscle samples. To summarize, this is what I have done:

- Imported the
*public*GTEx TPM data, selected just the skeletal muscle data, and normalized via log2(TPM+1); total genes = 56,200, samples = 803. Excluded all genes with near 0 variance and those with mean log2(TPM+1) <= 0.5, on the basis of this histogram, leaving 16,089 genes, 803 samples:

Computed the estimated soft power (signed network) on the remaining genes and plotted as usual:

At this point you can see that I need a power of 26 to even hit 0.8 on the measure of scale free topology, and the connectivity has dropped off a fair bit by then. So I started wondering what global drivers of gene expression might exist (as discussed in the WGCNA FAQ and elsewhere), and how to deal with them. I plotted the dendrogram along with a trait heatmap for any trait info I thought might be relevant. Sample clustering is by average Euclidean distance after the log2(TPM+1) transform:

As you can see, there are some definite clusters and it looks like they may be related to the terminal phase duration (Hardy score) and the tissue ischemia time, which each overlap quite a bit. The turquoise bands in the Hardy score represent the ventilated subgroup, so it's sadly not surprising that they have the lowest ischemia time. Having said all that, this kind of analysis is new to me, so I'm not sure how to adjust for these factors, which likely(?) are responsible for the high soft power. I tried re-running the soft-power calculations for just the ventilated subgroup, but didn't get significantly different results.

Thanks to anyone that read this far.. I'm not averse to creating multiple networks, but I'd like to have confidence in selecting my soft-power(s). I am considering a soft power of 12-16, as they are near the recommended sample size of 12, and while they have low signed scale free topology R^2 values, the mean and median connectivity values look o.k. Alternatively, I could use a soft power of 26, which gets the signed scale free topology R^2 up to ~0.8, but lowers the connectivity considerably.

I'd appreciate any input as far as a specific power to select, or other things to explore as far as correcting for covariates, etc.

Thank you!

Hi Peter,

Thanks very much for your advice, it's really helpful! I was thinking that prioritizing mean / median connectivity might make sense but didn't have enough confidence in my theoretical understanding of the method to justify it, so putting some actual numbers to it is great. I will likely go with a power around 16 then, as it appears to be the sweet spot.

By the cluster on the right, I assume you mean the one just up and right slightly of the large middle cluster, not the small group of a handful of samples in the upper right?

I'm not familiar with adjusting for the leading principal components or SVA so I'll have to do some reading on those. If I proceed with analyzing the entire set at once, I think adjusting via the PCs may be best, if SVA preserves the impact of the Hardy score / ischemic time. I'll have to play around with it and see. I was also thinking that I may try to construct networks separately for the ventilated / non-ventilated subjects and do a consensus analysis, because it would be interesting, biologically, to see if there are differences in the modules and module trait relationships.

Anyways thanks again for your help, it's given me some direction and things to explore.