Regarding finding hub genes using WGCNA
2
0
Entering edit mode
8 weeks ago
seta ★ 1.5k

Dear all,

I have got the gene expression microarray dataset (about 17000 genes) of about 400 cancer samples with different cancer subtypes. I considered subtypes as traits (binary traits) and used WGCNA to find the possible modules associated with traits and to identify hub genes. I used 50% genes with the highest variance as input for WGCNA and selected signed network type. Could you please help me out with some issues?

1. As the green module was one of the associated modules with one of the cancer subtypes (trait), I try to find hub genes in this module via the criteria of GS > 0.2 & MM. green > 0.8. It returned to me 6 genes, however, when I further checked them, I found that two genes belonged to another module, not the green module. The same thing happened when I found hub genes in another associated module. Could you please tell me why it has happened? What’s wrong?
1. Regarding modules with a negative association with the binary trait, how we should interpret them especially in terms of gene expression at those modules? Here, using the signed or unsigned network is important for interpretation?

gene-expression hub-genes WGCNA • 603 views
1
Entering edit mode
8 weeks ago

Hi, I have just published a paper using WGCNA where I improved WGCNA a little bit. You can see here. I do believe that as you finished reading my paper, you can self-answer your two questions well. Besides, R codes using the improved version of WGCNA were pushed to our Github (https://github.com/huynguyen250896/drivergene). However, should you still have any concern, do not hesitate to drop your question here.

0
Entering edit mode

Thank you for your response. I took a look at your paper, I'll try it. However, now, I didn't catch my answers. It's my first time using WGCNA, could you please let me know specifically the answers of the question?

1
Entering edit mode
8 weeks ago

As the green module was one of the associated modules with one of the cancer subtypes (trait), I try to find hub genes in this module via the criteria of GS > 0.2 & MM. green > 0.8. It returned to me 6 genes, however, when I further checked them, I found that two genes belonged to another module, not the green module. The same thing happened when I found hub genes in another associated module. Could you please tell me why it has happened? What’s wrong?

This is not possible so I guess you did something wrong during the selection. The output of the signedKME function should look like the data.frame below.

For example, the gene00011 belong to the blue module (MM = 0.94) but still has a MM for the cyan module of 0.82.

datKME_WT=signedKME(multiExpr$WT$data, mergedMEs_WT, outputColumnName="MM.")
names(datExpr) -> names(mergedColors_WT)
matrix_mergedColors_WT <-as.matrix(mergedColors_WT)
datKME_moduleColor_WT<-merge(matrix_mergedColors_WT, datKME_WT, by =  "row.names")

Row.names           V1     MM.cyan MM.lightcyan MM.lightgreen   MM.salmon  MM.blue     MM.greenyellow  MM.black
1  gene00001   lightgreen  0.18921980  0.034292334    0.63531697  0.13602407  0.305031489    0.099648423 -0.07497855
2  gene00002         blue  0.64468087  0.334272733   -0.03860808  0.63641185  0.753479431    0.574225848 -0.18239362
3  gene00003         blue  0.07705021 -0.322657990    0.03485713  0.31640875  0.391720482    0.517038670  0.27108623
4  gene00004         blue  0.58244822  0.032159210    0.36066706  0.68577835  0.854280546    0.671092702 -0.12259797
5  gene00005  greenyellow  0.15336767 -0.332123183    0.11236861  0.70933620  0.687739541    0.881175376  0.39838955
6  gene00006    turquoise  0.09267411  0.493756319    0.06919422 -0.32990762 -0.396841629   -0.731454202 -0.54741366
7  gene00007    turquoise  0.16573510  0.500944581    0.06799996 -0.53739752 -0.412083015   -0.811240414 -0.64712763
8  gene00008    turquoise  0.29941876  0.489617894    0.16386992 -0.13675715 -0.082734470   -0.476539603 -0.57997252
9  gene00009  greenyellow  0.18869704 -0.163737021    0.04222775  0.76679584  0.674572313    0.852283904  0.38104120
10 gene00010         blue  0.53615218  0.058295002    0.55403638  0.72056126  0.866062467    0.663952993 -0.07039581
11 gene00011         blue  0.82303113  0.288671439    0.48615401  0.60648940  0.938877068    0.544199035 -0.38204120
12 gene00012         blue  0.77350270  0.185758239    0.40904636  0.64260957  0.978811408    0.672022879 -0.27076743
13 gene00013         blue  0.79738706  0.260010275    0.37892031  0.65266143  0.967657338    0.645249951 -0.30468609
14 gene00014         blue  0.75616479  0.179855221    0.39774042  0.66900697  0.981235158    0.699922205 -0.24122598
15 gene00015         blue  0.81332463  0.260093653    0.37202946  0.63195789  0.959731945    0.625734130 -0.33064164


Let's say that cyan is the module of interest, and you are looking for genes in this module with a MM > of 0.8. If you do a subsetting of the MM data.frame only based on MM values in MM.cyan:

Cyan_08<-subset(datKME_moduleColor_WT, datKME_moduleColor_WT[,"MM.cyan"] > 0.8)
Row.names   V1   MM.cyan MM.lightcyan MM.lightgreen MM.salmon   MM.blue MM.greenyellow   MM.black MM.lightyellow
11 gene00011 blue 0.8230311    0.2886714     0.4861540 0.6064894 0.9388771    0.544199035 -0.3820412  -0.4310391298
15 gene00015 blue 0.8133246    0.2600937     0.3720295 0.6319579 0.9597319    0.625734130 -0.3306416  -0.3246696645
26 gene00028 blue 0.8065564    0.2741694     0.5267171 0.4999075 0.8731536    0.443267201 -0.4634928  -0.3310028561
28 gene00030 cyan 0.8477074    0.3489527     0.3331260 0.1508366 0.6455064    0.127876728 -0.6717292   0.0902075371
60 gene00065 cyan 0.9444121    0.5703349     0.4319337 0.1575432 0.6425243   -0.004577362 -0.8415948   0.0008084272
61 gene00066 cyan 0.9018271    0.4143577     0.6287749 0.1240565 0.6861586    0.042573472 -0.7723868  -0.0902572846


genes of the blue module will be also included in Cyan_08

Regarding modules with a negative association with the binary trait, how we should interpret them especially in terms of gene expression at those modules? Here, using the signed or unsigned network is important for interpretation?

If you have a signed network, the negative association for a binary trait (subtypeA ) means that the expression genes contributing to the 1st PC (module eigengene) of a given module is lower in the subtypeA samples. I always build 'signed' netwroks because are much easier to understand

0
Entering edit mode

Thank you very much for your nice explanation.

You’re right about the genes with MM > 0.8 belonged to two modules. I used your command to further investigate those genes, here FOXA1 and MLPH. The output of your command for these 2 genes is

.

Here, the green module was significantly associated with my trait. I used (FilterGenes= abs(GS1) > .2 & abs(datKME$MM.green) >.8) to get the hub genes of this module. It gave me 6 genes that two of which belonged to MM.turquoise. From the module-trait relationship heatmap, the green module had a positive significant correlation (0.86) and the turquoise had a negative significant correlation (-0.68) with the same trait. If I used the FilterGenes code without abs, I didn't get these genes. However, any other genes may have MM > 0.8 as you explained. Could you please tell me what shall I do? Eventually, we would like to use hub genes as signals (biomarkers) for downstream steps. So, which modules they are belonged to is not matter, am I right? please kindly correct me if I'm wrong. ADD REPLY 1 Entering edit mode However, any other genes may have MM > 0.8 as you explained. Could you please tell me what shall I do? to keep things simple I would focus only on the genes belonging to the modules of interest by subsetting the dataframe using the column V1. Then you can safely apply FilterGenes= abs(GS1)> .2 & abs(datKME$MM.brown)>.8 without worring too much about these kind of problems.

Eventually, we would like to use hub genes as signals (biomarkers) for downstream steps. So, which modules they are belonged to is not matter, am I right? please kindly correct me if I'm wrong.

If the module does not matter then the whole clustering analysis is pointless. If you pick WGCNA you must embrace the modules.

If you are looking for the hub genes with highest global connectivity then you can use the function intramodularConnectivity which calculate the whole-network connectivity of each gene

0
Entering edit mode

Many thanks, andres,

Yeah, I tried the intramodularConnectivity function for getting the genes with high kWithin as hub genes. Is there any threshold for "high" for selecting genes with high kWithin?

In the case of selecting the hub genes with the highest global connectivity, which genes belonged to which module is not the matter, yes? I got a bit confused when the "module" is really important?

1
Entering edit mode

Hi seta,

To be honest, I have never seen works selecting interesting genes only based on the global connectivity (kTotal) calculated with intramodularConnectivity. WGCNA studies tend to focus on the hub genes of the interesting modules (e.g. modules significantly correlated with the traits of interest). Since the turquoise module (which I think is the largest module in your network) is significantly correlated with one of your traits, there are high chances that its hub will include most of the genes with the highest kTotal.