Question

WGCNA gene clusters vs eigengenes

0

Entering edit mode

4.6 years ago

bioming ▴ 30

Hello, I'm trying to go through the WGCNA tutorial on mice liver data from https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/.

I understood the concepts from expression matrix -> soft thresholding -> adjacency matrix -> dissTOM -> hclust. This is where I'm starting to get confused: after hclust and I generate the dendrogram using "dynamic tree cut", and it detection a set of modules, I color modules with "dynamicColors". But then the tutorial uses moduleEigengenes() to generate another set of modules (albeit less modules than from hclust()).

My questions is:

does moduleEigengenes() use any information from hclust() generated modules? or is it just another way to generate modules? and you compare that to modules generated by hclust()?? but then I read from a presentation slide (https://edu.isb-sib.ch/pluginfile.php/158/course/section/65/01SIB2016_wgcna.pdf) that moduleEigengenes() merges similar modules... so... does moduleEigengenes() merge similar modules generated by hclust()?? But from the code

MEList = moduleEigengenes(datExpr, colors = dynamicColors)

the only thing moduleEigengenes() takes as input that remotely comes from the hclust() is dynamicColors, doesn't seem to be using modules generated by hclust() at all... am I missing something?

but after moduleEigengenes(), the tutorial hclust() again using "as.dist(MEDiss)" instead of "dissTOM" as was with the first hclust()...

very confused, any insight would be very appreciated thanks!

Ming

wgcna network analysis eigengenes hclust • 5.8k views

ADD COMMENT • link updated 4.6 years ago by finswimmer 16k • written 4.6 years ago by bioming ▴ 30

0

Entering edit mode

Cross-posted on Biioconductor: https://support.bioconductor.org/p/124477/

ADD REPLY • link 4.6 years ago by Kevin Blighe 87k

score 4 · Accepted Answer · 2019-09-07

4

Entering edit mode

4.6 years ago

andres.firrincieli 3.6k

does moduleEigengenes() use any information from hclust() generated modules? or is it just another way to generate modules?

the function moduleEigengenes() doesn't generate or merge any module. moduleEigengenes() simply calculate the 1st Principal Component (PC) i.e., module eigengene (ME), of each module. Because the 1st PC summarize the module expression profile, the section 2.b.5 of the tutorial show you how to merge modules whose expression profiles are similar; therefore, the number of modules is reduced.

Sections 2.b.5 and 2.b.4 are very similar, both starts with a matrix of correlation values; what changes is the scale: in 2.b.4 gene-gene cor values are used to cluster genes into modules; in 2.b.5 module-module cor values are used to cluster modules sharing very similar expression profiles.

ADD COMMENT • link 4.6 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Thank-you so much for your answer, just to double check if I understood correctly:

geneTree = hclust(as.dist(dissTOM), method = "average")

is clustering genes into modules - using gene-gene cor values

vs.

METree = hclust(as.dist(MEDiss), method = "average")

which is clustering modules - using module-module cor values from the 1stPC identified by moduleEigengenes() ?

I've seen people usually go on and find associations btw ME and trait, what is the purpose of hclust(as.dist(dissTOM) if it's not used to associate with trait? Is it just a QC step to see if clusters in METree correspond to clusters in geneTree?

ADD REPLY • link 4.6 years ago by bioming ▴ 30

0

Entering edit mode

From the start (section 2.4.b):

adjacency = adjacency(datExpr, power = softPower) # compute a matrix of gene-gene correlation values
TOM = TOMsimilarity(adjacency) # transform the adjacency into Topological Overlap Matrix (TOM)
dissTOM = 1-TOM # calculate the corresponding dissimilarity (dissTOM)

At this point the only way you have to see how your genes clusters together is to generate a hierarchical clustering tree (geneTree) of genes with hclust() using the TOM−based dissimilarity matrix (dissTOM).

geneTree = hclust(as.dist(dissTOM), method = "average")

The object geneTree can be visually inspected using the function plot()

plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04);

By visually inspecting the geneTree you might get an idea of how your genes clusters, and peraphs adjust some of the arguments, e.g. cutHeight and minClusterSize, of the cutreeDynamic() function.

Modules are predicted with the function cuttreeDynamic() and not hclust(). The way you set the cutreeDynamic() argument affect the final number of modules. For example, small modules are lost if you use a minClusterSize of 100 instead of 30.

dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);

The steps in the section 2.b.5 are basically the same. The only thing you are missing is an object that summarize the module expression values, i.e., module eigengenes (MEs).

Calculate module eigengenes

MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes

Calculate dissimilarity of module-module eigengenes correlation values (cor(MEs))

MEDiss = 1-cor(MEs);

Generate a module tree dendrogram

METree = hclust(as.dist(MEDiss), method = "average");

Visually inspect the METree with the plot() function and choose the height cut. As reported in the example, a cut of 0.25 corresponds to correlation of 0.75

merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = .25, verbose = 3)

Then finish the tutorial to complete the merging. In the end, hclust() is used only for a visual inspection of the gene/module tree dendrogram.

I've seen people usually go on and find associations btw ME and trait, what is the purpose of hclust(as.dist(dissTOM) if it's not used to associate with trait?

The association between the traits is used to find which modules correlate with a specific trait which can be continuous (eg. weight, metabolites levels, hormones level etc...) or categorical (e.g treatment, sex, timepoint etc...). You simply calculate that by correlating the MEs with the trait. See this part of the tutorial

For QC, you could plot the MEs or generate the heatmap of the modules. There should be an agreement between the expression profile of the modulse and your experimental design.

ADD REPLY • link 4.6 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Thank-you for the thorough explanation, it makes sense now.

ADD REPLY • link 4.6 years ago by bioming ▴ 30

0

Entering edit mode

Ohhhh sorry I can't believe I miss that, so you get the 1stPC from the modules identified from hclust(as.dist(dissTOM) ?

ADD REPLY • link 4.6 years ago by bioming ▴ 30