Hello,
I need some guidance with finding the soft threshold for WGCNA. I have been following the tutorial and am up to the point where I design a red line that corresponds to using the R^2 cut off. I am using log (counts per million) normalized data using limma/voom for a large RNA seq experiment with many variables (including 3 brain regions, 3 ages, sex, genotype.) Limma voom is was necessary for this data set to account for the many variables. When constructing the tree diagram there are no major outliers in the data. My data does not reach the 0.9 threshold when plotting the data so the line does not fall within the graph. It seems that something is off with the power of the experiment and I am not sure how to proceed? Any help would be great! (code and graph below)
powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr0, powerVector = powers, verbose = 5)
sizeGrWindow(9, 5)
par(mfrow = c(1,2));
cex1 = 0.9;
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,cex=cex1,col="red")
abline(h=0.90,col="red")
The output provides the following data
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.941 2.050 0.977 6530.0 6870.0 9170
2 2 0.442 0.415 0.735 3270.0 3440.0 5940
3 3 0.130 -0.196 0.559 1940.0 1970.0 4320
4 4 0.470 -0.535 0.736 1270.0 1220.0 3350
5 5 0.592 -0.753 0.811 884.0 795.0 2710
6 6 0.655 -0.900 0.855 644.0 539.0 2250
7 7 0.679 -1.020 0.875 486.0 377.0 1910
8 8 0.701 -1.110 0.891 376.0 270.0 1650
9 9 0.715 -1.180 0.904 297.0 196.0 1440
10 10 0.725 -1.250 0.910 239.0 145.0 1260
11 12 0.753 -1.350 0.930 162.0 83.1 1010
12 14 0.752 -1.450 0.927 114.0 50.1 820
13 16 0.763 -1.520 0.934 83.4 31.3 681
14 18 0.774 -1.570 0.943 62.5 19.8 574
15 20 0.781 -1.610 0.948 47.9 13.0 490
Thank you so much for the helpful advice! Do you the relatively low power could be due to me using previously normalized data? I used limma/voom to analyze my RNA seq and am using the weighted normalized data here.
I have also seen other posts that mention the fact that lower power can be due to some sort of driving variable that skews the data away from being a scale free network.
this is not your case. In those posts, they reach a "scale free topology" at relatively low power
Read the WGCNA faq: I can't get a good scale-free topology index no matter how high I set the soft-thresholding power.
Yes! I have read almost all of the FAQ and tutorial explanations from the Horvath lab regarding soft thresholding. I was under the assumption that not reaching at least an h of 0.9 suggested low power
huh. Andres - that is an awesome link nice find. JMS - i will go back through old studies and try to see whther the pipeline I had going did the thresholding before or after normalization ... being realistic i might not get to till weekend. sorry if thats too long :-(
It looks liek there are a lot of posts highly relevant to your first comment about normalized data both here and on the bioconductor fora. please see:
https://support.bioconductor.org/p/84835/#84842 WGCNA soft thresholding problem WGCNA: scale-free topology fit indexes are negative
its been so long since i did it that all i remember is that there is a very specific flow between coexpression matrices, adjacency, similarity, TOM, etc. but I will have to go back and verify to be certain. i will do this within a few days. please see later post also. good luck!
great post and great job so far. youd be an asset to this community i hope you stay!!!