Question: How to find p-value of the modules in WGCNA?
2
gravatar for fernardo
2.9 years ago by
fernardo 150
Italy
fernardo 150 wrote:

Hi,

I did run WGCNA package successfully but I couldn't find a way or line of code to get p-value of the modules. I have seen it in a paper that there p-value were given to the modules. Can you help please?

Thanks a lot

R rna-seq microarray wgcna • 2.7k views
ADD COMMENTlink modified 12 months ago by 149abhi10 • written 2.9 years ago by fernardo 150

How to find the results for each modules (summary) in WGCNA? can you help me with code?

ADD REPLYlink written 12 months ago by 149abhi10

'Cross'-posted to here: A: WGCNA analysis - use only linkage coefficients for results analysis

ADD REPLYlink written 12 months ago by Kevin Blighe70k
6
gravatar for Kevin Blighe
2.9 years ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:

Buonasera Fernardo,

The idea is that you, first, derive your modules via the standard WGCNA functions and, second, obtain the module 'scores' for each sample to each module. This will give you something like:

         Module1   Module2   ...   ModuleX
Sample1  0.66      0.43      ...   0.33
Sample2  0.45      0.34      ...   0.2
...      ...       ...       ...   ...
SampleX  0.9       0.7       ...   0.45

Poi (then), you can use these module scores in regression analysis or in correlation, i.e., regressing the module scores to a clinical phenotype:

Any of the modules related to weight?

summary(lm(weight ~ Module1))
summary(lm(weight ~ Module2))

Any module relate to case / control status (binary phenotype)?

summary(glm(CaseControl ~ Module1, family=binomial()))
summary(glm(CaseControl ~ Module2, family=binomial()))

----------------------------------

You can also build a simple correlation plot, like I have done here: CorLevelPlot - Visualise correlation results, e.g., clinical parameter correlations Cor_Level_Plot1

Nota bene - WGCNA can also generate similar plot to this

Ci vediamo dopo,

Kevin

ADD COMMENTlink modified 2.3 years ago • written 2.9 years ago by Kevin Blighe70k

Thank you very much Kevin.

I could manage to find the matrix you mentioned between the samples and the modules. But what could be the "Weight" or "CaseControl"?

Thanks

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by fernardo 150
1

They would be 'phenotype' parameters, with 1 value per sample.

For example:

head(MyData)

         Weight.Kg CaseControl Module1   Module2   ...   ModuleX
Sample1  45        Case        0.66      0.43      ...   0.33
Sample2  47        Case        0.45      0.34      ...   0.2
...      ...       ...         ...       ...       ...   ...
SampleX  74        Control     0.9       0.7       ...   0.45


summary(lm(weight ~ Module1, data=MyData))
summary(lm(weight ~ Module2, data=MyData))

summary(glm(CaseControl ~ Module1, , data=MyData, family=binomial()))
summary(glm(CaseControl ~ Module2, , data=MyData, family=binomial()))
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Kevin Blighe70k

Thanks. Understood. Even, unfortunately, it gave an error while I was running CaseControl;y (summary(glm(y ~ Module1, , data=MyData, family=binomial()))):

Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1
ADD REPLYlink written 2.9 years ago by fernardo 150

You will have to encode MyData$y as a categorical variable via the factor() command. For example:

MyData$y <- factor(MyData$y, levels=c("Control", "Case"))
ADD REPLYlink written 2.9 years ago by Kevin Blighe70k

I changed Y(CaseControl) column to 0 and 1 and worked. But gave a warning message but maybe I can ignore it :)

Call:

glm(formula = y ~ MEblue, family = binomial(), data = xx)

Deviance Residuals: Min 1Q Median 3Q Max
-8.106e-05 -2.100e-08 2.100e-08 2.100e-08 8.197e-05

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 190.1 72757.4 0.003 0.998

MEblue 1732.4 660316.6 0.003 0.998

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 3.4162e+01 on 28 degrees of freedom

Residual deviance: 1.3289e-08 on 27 degrees of freedom

AIC: 4

Number of Fisher Scoring iterations: 25

Warning messages:

1: glm.fit: algorithm did not converge

2: glm.fit: fitted probabilities numerically 0 or 1 occurred

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by fernardo 150
1

Well, we now know that MEblue has nothing in relation to CaseControl! - p-value=0.998.

You should now check each module independently and see which are statistically significant.

The warning message most likely appeared as a result of low sample numbers.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Kevin Blighe70k

Thank you very much, now I have all the answers. But if I may ask about one more interesting point, the blue module has more significant enriched pathways and ontology annotations with even more connection in PPI compared to other modules with low p-value but no enrichment or very less? I can't see it fair to go with other modules but not the Blue.

What do you say? your experience? Thanks a lot.

ADD REPLYlink written 2.9 years ago by fernardo 150

Is the blue module also the module with the highest number of genes?

You also did the correlation analysis of the modules to your phenotypes? - were the results the same as the regression?

ADD REPLYlink written 2.9 years ago by Kevin Blighe70k

Not actually. The blue module has not the lowest or the highest number of genes (DEGs).

What correlation do you mean? Do you mean using "summary(lm(CaseControl ~ MEblue))"? Yes I tried this and were highly significant for almost all the modules.

ADD REPLYlink written 2.9 years ago by fernardo 150

But, if the blue module is not related to CaseControl, then what is your interpretation of that? It could represent the 'common' pathways across all of your samples. The interesting modules should be the ones that are statisatically significant to CaseControl.

ADD REPLYlink written 2.7 years ago by Kevin Blighe70k

well I'd say that I received different clusters (modules) from WGCNA and one of them seems to be more significant based on pathways and so on, not based on the correlation we discussed above. And the genes are all differentially expressed between case/control, so the common pathways across e.g. the cases are something to be considered significant, right? why common pathways across the Case samples are shouldn't be relevant? What do you think?

ADD REPLYlink written 2.6 years ago by fernardo 150

Interestingly, in my case I observed the module that was the only significant module in correlation test (also the largest), failed in the glm fit test. Another module came out as significant which I was not expecting. Also the test with the largest module showed some warning as mentioned by @fernardo. Am I missing something in this?

! Screenshot of terminal

ADD REPLYlink modified 7 months ago • written 7 months ago by Arindam Ghosh340

Yeh, but, even looking at the values for turquoise, there are no great differences, and the sample n is too small. The correlation test worked how?

ADD REPLYlink written 7 months ago by Kevin Blighe70k

the sample n is too small.

n=45, it that still small? What would be the minimal number of samples that I should target? WGCNA tutorial mentions at least 20 samples. [In image, only top samples displayed by head()]

For correlation test, I used the default Pearson's correlation:

moduleTraitCor <- cor(MEs, trait, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
ADD REPLYlink modified 6 months ago • written 6 months ago by Arindam Ghosh340
1

I thought that you had 6 samples, but I was mis-reading your terminal output. You could check the distribution of the values in that module, which may reveal more information.

ADD REPLYlink written 6 months ago by Kevin Blighe70k

The most observable thing that I saw for the MEs was that in turquoise module, the type1 samples had positive MEs (~ 0.14 or 0.06) while the type0 samples had negative MEs (~ -0.16 or -0.07 ). For other modules, no such uniformity were found.

Also, for the turquoise module the correlation was high in the scatter plot of gene module membership and gene trait significance. But as mentioned earlier this module did not qualify the glm() module trait association.

ADD REPLYlink modified 6 months ago • written 6 months ago by Arindam Ghosh340
1

The directionality of those units is not too important, I think, and, for all intents and purposes, they can be regarded as 'unitless'. Not sure where you can go from here other than re-running the analysis a few times.

ADD REPLYlink written 6 months ago by Kevin Blighe70k

How do you combine all those modules ,as if i understand once you get modules each modules contains sample and gene expression ,but here in your Mydata what i see is each sample trait+values in modules. Can you explain ?

ADD REPLYlink written 17 months ago by krushnach80890

Dear Dr. Blighe I construct my co-express network but I don't know how can I calculate module 'scores' for each sample to each module. In other words, I can't manage the below table that you suggested in the last comment:

Module1 Module2 ... ModuleX Sample1 0.66 0.43 ... 0.33 Sample2 0.45 0.34 ... 0.2 ... ... ... ... ... SampleX 0.9 0.7 ... 0.45

I appreciate if you share your comment with me and guide me on how to get that table in WGCNA?

ADD REPLYlink written 20 months ago by modarzi140

Dear Dr. Blighe I construct my co-express network but I don't know how can I calculate module 'scores' for each sample to each module. In other words, I can't manage the below table that you suggested in the last comment:

Module1   Module2   ...   ModuleX
Sample1  0.66      0.43      ...   0.33
Sample2  0.45      0.34      ...   0.2
...      ...       ...       ...   ...
SampleX  0.9       0.7       ...   0.45

I appreciate if you share your comment with me and guide me on how to get that table in WGCNA?

ADD REPLYlink written 20 months ago by modarzi140

You are searching for 'module eigengenes'. There is an example of what you need, here: https://www.polarmicrobes.org/weighted-gene-correlation-network-analysis-wgcna-applied-to-microbial-communities/

Search for moduleEigengenes(datExpr0, moduleColors)$eigengenes

ADD REPLYlink modified 20 months ago • written 20 months ago by Kevin Blighe70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1145 users visited in the last hour
_