The modules are not clustered together after WGCNA
0
0
Entering edit mode
3 months ago
cwwong13 ▴ 10

I am running WGCNA and trying to visualize the gene network as TOM plot. Rather than using the native function in the package, I am looking for plotting it out by ggplot. The main reason is I would like to also annotate the TOM plot by placing rectangles on each module found by the WGCNA.

However, I found a very hard time reproducing the plot of the native function, one of the main issues is when I trying to look for the order of genes by using the net$dendrograms[[1]]$order and also the module labels by net$color, I find that the color (or module) are not clustered together. Rather they are very scattered. Therefore, I am not sure if I can place a rectangle to bracket the modules. However, when I look at the dendrograms with module labels plotted by the native WGCNA function, the module (at least at the color level) seems clustered together. I did not use the pam options and the tree deep is set to 2, which I believe is quite conservative in defining modules. I am confused and not sure whether I have missed something. cluster WGCNA r dendrograms • 648 views ADD COMMENT 0 Entering edit mode I find that the color (or module) are not clustered together. Rather they are very scattered. the module labels in net$color follows the gene order in the expression matrix:

gene1 blue
gene2 red
gene3 blue
gene4 turquoise
...


I am running WGCNA and trying to visualize the gene network as TOM plot. Rather than using the native function in the package, I am looking for plotting it out by ggplot.

Why?

0
Entering edit mode

I acknowledged that the net$color is following the order in the expression matrix. Thus, I have reordered the net$color by the net$dendrograms[[1]]$order. However, even after the rearrangement, the color labels are still very scattered. I can hardly find a stretch of consecutive genes that are grouped into a cluster that is more than 7/8 genes (I have already dropped cluster 0, but the modules are still interspersed).

Indeed, I am looking to do two modification that seems cannot be achieved easily by the native function:

1. I would like to merge two TOMs from two different datasets (one is put in the upper triangle, another in the lower);
2. As mentioned, I would like to annotate the cluster on the heatmap by adding "rectangles" to highlight each module.
0
Entering edit mode

What is the output of:

head(net$color) and head(net$dendrograms[[1]]$order) I would like to merge two TOMs from two different datasets (one is put in the upper triangle, another in the lower); Something like Fig2 panels B-D-F but using the entire network TOMs from two different datasets? link ADD REPLY 0 Entering edit mode Yes, I would like to have a similar plot as B-D-F but for the whole TOM network. Indeed, I also like their color palette, thus would be great if I can know how to reproduce their figures. head(net$color)

ENSG00000000003 ENSG00000000005 ENSG00000000419 ENSG00000000457 ENSG00000000460
0               0              18               1              12
ENSG00000000938
5

head(net$dendrograms[[1]]$order)
[1] 20343  6961 17173 16234 16803 18315


I am not sure if head() would really give you enought information. I use TOM[net$dendrograms[[1]]$order, net$dendrograms[[1]]$order] to rearrange the matrix. And below is an excerpt of the module annotation (after merging them), the geneorder is from dendrograms:

as_tibble(net$color, rownames = "Ensembl") %>% dplyr::rename(module = value) %>% mutate(geneorder = !!(net$dendrograms[[1]]$order)) %>% arrange(geneorder) %>% slice_head(n = 17173) %>% slice_tail(n = 371) %>% head(20) # A tibble: 20 × 3 Ensembl module geneorder <chr> <dbl> <int> 1 ENSG00000000460 12 16803 2 ENSG00000266777 2 16804 3 ENSG00000122778 3 16805 4 ENSG00000203814 0 16806 5 ENSG00000278921 0 16807 6 ENSG00000141540 0 16808 7 ENSG00000114648 0 16809 8 ENSG00000078808 0 16810 9 ENSG00000067191 0 16811 10 ENSG00000092853 7 16812 11 ENSG00000151176 0 16813 12 ENSG00000147454 0 16814 13 ENSG00000115827 14 16815 14 ENSG00000275234 0 16816 15 ENSG00000070718 0 16817 16 ENSG00000152137 0 16818 17 ENSG00000117519 0 16819 18 ENSG00000160613 0 16820 19 ENSG00000180488 12 16821 20 ENSG00000169085 4 16822  I can hardly find a region that is not interspersed by module 0. similar results were found for the tail: as_tibble(net$color, rownames = "Ensembl") %>%
dplyr::rename(module = value) %>%
mutate(geneorder = !!(net$dendrograms[[1]]$order)) %>%
arrange(geneorder) %>%
slice_tail(n = 371) %>%
tail(20)
# A tibble: 20 × 3
Ensembl         module geneorder
<chr>            <dbl>     <int>
1 ENSG00000044574     16     17154
2 ENSG00000172301      0     17155
3 ENSG00000072657      3     17156
4 ENSG00000168758      0     17157
5 ENSG00000107897      1     17158
6 ENSG00000206147     10     17159
7 ENSG00000149346     21     17160
8 ENSG00000148700      0     17161
9 ENSG00000091656      4     17162
10 ENSG00000087263      0     17163
11 ENSG00000158402      7     17164
12 ENSG00000114520      1     17165
13 ENSG00000196604      0     17166
14 ENSG00000089327      0     17167
15 ENSG00000132275      0     17168
16 ENSG00000260001      0     17169
17 ENSG00000255561      0     17170
18 ENSG00000233673      0     17171
19 ENSG00000183018      0     17172
20 ENSG00000000419     18     17173

1
Entering edit mode

Yes, I would like to have a similar plot as B-D-F but for the whole TOM network.

I have a solution for that but with TOMplot.

Note that datExp1 and datExpr2 must have the same gene order

   dissTOM1 = 1-TOMsimilarityFromExpr(datExpr1, power = 6)  #Calculate topological overlap matrix for datExpr1
dissTOM2 = 1-TOMsimilarityFromExpr(datExpr2, power = 6)  #Calculate topological overlap matrix for datExpr2
final_TOM=dissTOM1*lower.tri(dissTOM1)+dissTOM2*upper.tri(dissTOM2)  # create a new TOM: lower triangle is the TOM from the datExpr1 while the upper triangle is the TOM from the datExpr2
diag(final_TOM)=NA
library(gplots)
myheatcol = colorpanel(250,'red',"yellow",'black')
TOMplot(final_TOM, geneTree1, moduleColors1, main = "Network heatmap plot, all genes", col=myheatcol) # in this heatmap we impose for upper triangle the gene clustering of the network 1 (geneTree1 and moduleColors1)


Since you used the blockwiseModules function you will need to calculate again the gene tree dendrogram (geneTree) and the modules (moduleColors) for the dataset datExp1'.

0
Entering edit mode

Thanks for your suggestion, but I am afraid that does not fully solve my questions: I would like to add a rectangle on top of the heatmap to highlight the module. Another minor problem would be (although I did not mention it in my original question) I was planning to add the dendrogram for each dataset by patchwork or similar package, hence, I probably need photoshop the output after using TOMplot.

More importantly, I think the core issue is unsolved: why the module labels are not clustered as the dendrogram.

0
Entering edit mode

More importantly, I think the core issue is unsolved: why the module labels are not clustered as the dendrogram.

This issue is automatically solved once you plot the heatmap of each TOM along with their respective gene dendrogram. Any good tool designed for heatmap will allow you to reaorder rows and columns using an external dendrogram.

Another minor problem would be (although I did not mention it in my original question) I was planning to add the dendrogram for each dataset by patchwork or similar package, hence, I probably need photoshop the output after using TOMplot.

So you basically want two independent TOMplot (upper and lower triangle) in a single panel. With some tweak maybe you can use the chunk of code at the end of this tutorial . I am also interested in a plot like that, so I will keep you updated.

A final note, If you can avoid the blockwiseModules approach

0
Entering edit mode

Sorry for the late reply. I will definately try! Thanks.

A final note, If you can avoid the blockwiseModules approach What do you mean by this? Are you referring to using other clustering methods?

0
Entering edit mode

Is explained here why you should avoid the blockwiseModules approach.

0
Entering edit mode

I went back to the matrix and tried to plot the TOM using the native function. I think the gene (and the colors representing the modules) clustered properly. Noted that I used the same geneTree and moduleColors to get the values inputs for TOMplot. That's why I am so confused if I am wrong if any steps, so that it is wrong when I plot them out with ggplot.

I have increased the block size for blockwiseModules to 250000 which includes all the input genes/ features (220000). And I have also checked the output net`, all the genes were analyzed in a single block.