Question

Problem getting soft threshold power using WGCNA for RNA-seq data

1

Entering edit mode

2.4 years ago

jms2520 ▴ 30

Hello,

I need some guidance with finding the soft threshold for WGCNA. I have been following the tutorial and am up to the point where I design a red line that corresponds to using the R^2 cut off. I am using log (counts per million) normalized data using limma/voom for a large RNA seq experiment with many variables (including 3 brain regions, 3 ages, sex, genotype.) Limma voom is was necessary for this data set to account for the many variables. When constructing the tree diagram there are no major outliers in the data. My data does not reach the 0.9 threshold when plotting the data so the line does not fall within the graph. It seems that something is off with the power of the experiment and I am not sure how to proceed? Any help would be great! (code and graph below)

powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr0, powerVector = powers, verbose = 5)
sizeGrWindow(9, 5)
par(mfrow = c(1,2));
cex1 = 0.9;
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
    xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
    main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red")
abline(h=0.90,col="red")

The output provides the following data

Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
1      1    0.941  2.050          0.977  6530.0    6870.0   9170
2      2    0.442  0.415          0.735  3270.0    3440.0   5940
3      3    0.130 -0.196          0.559  1940.0    1970.0   4320
4      4    0.470 -0.535          0.736  1270.0    1220.0   3350
5      5    0.592 -0.753          0.811   884.0     795.0   2710
6      6    0.655 -0.900          0.855   644.0     539.0   2250
7      7    0.679 -1.020          0.875   486.0     377.0   1910
8      8    0.701 -1.110          0.891   376.0     270.0   1650
9      9    0.715 -1.180          0.904   297.0     196.0   1440
10    10    0.725 -1.250          0.910   239.0     145.0   1260
11    12    0.753 -1.350          0.930   162.0      83.1   1010
12    14    0.752 -1.450          0.927   114.0      50.1    820
13    16    0.763 -1.520          0.934    83.4      31.3    681
14    18    0.774 -1.570          0.943    62.5      19.8    574
15    20    0.781 -1.610          0.948    47.9      13.0    490

enter image description here

Soft RNAseq WGCNA Thresholding • 2.3k views

ADD COMMENT • link updated 2.4 years ago by LauferVA 4.2k • written 2.4 years ago by jms2520 ▴ 30

score 0 · Answer 1 · 2021-11-28

0

Entering edit mode

2.4 years ago

LauferVA 4.2k

Hi jms - soft thresholding (and hard thresholding for that matter) are based on the assumption that use of such thresholds will cut out noise in correlation matrices, thereby "accentuating" the "true" networks in the data. I believe that dozens of empiric experiments substantiate this assumption. Nevertheless, there is still a lot of "art" here (though perhaps some science, too).

When I've faced the problem of threshold selection, I've used a method like this:

1) Establish a decision criterion for threshold selection first. Ideally, this decision criterion is unrelated to your question of interest.

2) Come up with a way to decide which threshold selection has performed the best according to your criterion.

3) Test several of the results you've generated against your decision criterion.

4) Based on 2., choose the threshold you think is performing the best.

Example, suppose a large amount of data from your and other labs suggest a certain battery of genes should be co-expressed. You could make a decision criterion around this observation, and then observe at what power that battery of genes is most clearly discernible. At a high enough threshold, the module disappear even if it is a true positive gene module in your data ... that would be too high. At too low a threshold, it might be that each module contains what are likely to be many modules, or extraneous genes, etc.

You could further nuance the basic methodology above based on knowledge you have about the samples in your data. For instance, suppose you have reason to believe that i.e., gene set X should be coexpressed in WT, but not in KO. Now you can make a more focused decision rule, which should help you control type I and II error if properly applied.

Other authors have published other ways of selecting in their work. Can read papers from your target journal to see how they selected a threshold, too.

This link (or other posts here) may be of help, as well: https://support.bioconductor.org/p/87024/

ADD COMMENT • link 2.4 years ago by LauferVA 4.2k

0

Entering edit mode

Thank you so much for the helpful advice! Do you the relatively low power could be due to me using previously normalized data? I used limma/voom to analyze my RNA seq and am using the weighted normalized data here.

I have also seen other posts that mention the fact that lower power can be due to some sort of driving variable that skews the data away from being a scale free network.

ADD REPLY • link 2.4 years ago by jms2520 ▴ 30

1

Entering edit mode

I have also seen other posts that mention the fact that lower power can be due to some sort of driving variable that skews the data away from being a scale free network.

this is not your case. In those posts, they reach a "scale free topology" at relatively low power

Read the WGCNA faq: I can't get a good scale-free topology index no matter how high I set the soft-thresholding power.

ADD REPLY • link 2.4 years ago by andres.firrincieli 3.6k

1

Entering edit mode

Yes! I have read almost all of the FAQ and tutorial explanations from the Horvath lab regarding soft thresholding. I was under the assumption that not reaching at least an h of 0.9 suggested low power

ADD REPLY • link 2.4 years ago by jms2520 ▴ 30

0

Entering edit mode

huh. Andres - that is an awesome link nice find. JMS - i will go back through old studies and try to see whther the pipeline I had going did the thresholding before or after normalization ... being realistic i might not get to till weekend. sorry if thats too long :-(

ADD REPLY • link 2.4 years ago by LauferVA 4.2k

0

Entering edit mode

It looks liek there are a lot of posts highly relevant to your first comment about normalized data both here and on the bioconductor fora. please see:

https://support.bioconductor.org/p/84835/#84842 WGCNA soft thresholding problem WGCNA: scale-free topology fit indexes are negative

its been so long since i did it that all i remember is that there is a very specific flow between coexpression matrices, adjacency, similarity, TOM, etc. but I will have to go back and verify to be certain. i will do this within a few days. please see later post also. good luck!

ADD REPLY • link 2.4 years ago by LauferVA 4.2k

0

Entering edit mode

great post and great job so far. youd be an asset to this community i hope you stay!!!

ADD REPLY • link 2.4 years ago by LauferVA 4.2k