Question

how to increase modules in WGCNA

0

Entering edit mode

2.7 years ago

Maryam • 0

Hi,

I did wgcna analysis to do the project I used 336 samples for this analysis, but in the end, it gave me 3 modules, about 16,000 genes, 12,000 in one module, 379 in one module, and 212 in another. I think I only have 3 modules and it is very unusual with these descriptions.

Based on the output below, I selected Power 10.

1- Is there a way for me to increase the number of modules?

2- I drew the following error to draw one of the diagrams.

3- Do you think it may depend on the number of modules?

4- And how to fix this error ?

enter image description here

Thanks.

modules R WGCNA • 2.4k views

ADD COMMENT • link 2.7 years ago by Maryam • 0

0

Entering edit mode

See Peter Langfelder answer regarding having very few modules with a large number of genes: link

4- And how to fix this error ?

These kind errors are usually triggered by a matrix not properly formatted. I would carefully check the structure of each matrix in labeledHeatmap

ADD REPLY • link 2.7 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Thank you very much for your help I will definitely apply your suggestion

ADD REPLY • link 2.7 years ago by Maryam • 0

0

Entering edit mode

also, I checked the link you sent before and applied its suggestions, but it did not work and my large module did not change and reduced the gray module and made a module. And the result was not good.

ADD REPLY • link 2.7 years ago by Maryam • 0

1

Entering edit mode

I'm sorry, I wasn't clear enough. The whole point is:

check the sample clustering tree for large drivers (strong branches); large modules are often the result of having very strong global drivers of expression.

Of 16,000 genes, you have 12,000 genes in one single module. What does this mean? You have a very strong driver of variation causing 12,000 genes clustering all together in one signle module. Increasing or reducing the power is not going to fix the problem because this kind of behaviors are caused by intrinsic factors (either technical or biological) affecting your 336 samples.

From WGCNA faq:

My data are heterogeneous. Can I still use WGCNA? Data heterogeneity may affect any statistical analysis, and even more so an unsupervised one such as WGCNA. What, if any, modifications should be made to the analysis depends crucially on whether the heterogeneity (or its underlying driver) is considered "interesting" for the question the analyst is trying to answer, or not. If one is lucky, the main driver of sample differences is the treatment/condition one studies, in which case WGCNA can be applied to the data as is. Unfortunately, often the heterogeneity drivers are uninteresting and should be adjusted for. Such factors can be technical (batch effects, technical variables such as post-mortem interval etc.) or biological (e.g., sex, tissue, or species differences).

If one has a categorical source of variation (e.g., sex or tissue differences) and the number of samples in each category is large enough (at least 30, say) to construct a network in each category separately, it may be worthwhile to carry out a consensus module analysis (Tutorial II, see WGCNA Tutorials). Because this analysis constructs a network in each category separately, the between-category variation does not affect the analysis.

If it is desired to construct a single network for all samples, the unwanted or uninteresting sources of large variation in the data should be adjusted for. For categorical (ordinal) factors we recommend using the function ComBat (from the package sva). Users who have never used ComBat before should read the help file for ComBat and work through the sva vignette (type vignette("sva") at the R prompt) to make sure they use ComBat correctly.

For continuous sources of variation (e.g., postmortem interval), one can use simple linear regression to adjust the data. There may be more advanced methods out there that also allow the use of covariates and protect from over-correction.

Whichever method is used, we caution the user that removal of unwanted sources of variation is never perfect and it can, in some cases, lead to removal of true interesting signal, and in rare cases it may introduce spurious association signal. Thus, only sources of relatively large variation should be removed.

ADD REPLY • link 2.7 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Hi.

Thank you for your guidance.

In your opinion, to do this, it is necessary to apply a batch effect with a deseq2 ?

ADD REPLY • link 2.7 years ago by Maryam • 0

1

Entering edit mode

In your opinion, to do this, it is necessary to apply a batch effect with a deseq2 ?

You should use deseq2 to normalize your expression data via variance-stabilization or rlog-expression, and then use limma::removeBatchEffect() or follow this tutorial to remove the batch effect.

Sorry if I am wrong, but I have the feeling that you did not check for the presence of batch effects in your dataset. For example, use deseq2 to normalize the expression data and then run the PCA on the normalized expression data. Look at the PCA plot and check how your 336 samples separate along the PC1