Question

Tutorial:Bulk RNA-seq: WGCNA (Weighted gene co-expression network analysis) analysis

2

Entering edit mode

3 months ago

Julia Ma ▴ 120

This content has been copy pasted from: https://omicverse.readthedocs.io/en/latest/Tutorials-bulk/t_wgcna/

Weighted gene co-expression network analysis (WGCNA) is a systems biology approach to characterize gene association patterns between different samples and can be used to identify highly synergistic gene sets and identify candidate biomarker genes or therapeutic targets based on the endogeneity of the gene sets and the association between the gene sets and the phenotype.

Paper: WGCNA: an R package for weighted correlation network analysis

Code: Reproduce by Python. Raw is http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA

Colab_Reproducibility: https://colab.research.google.com/drive/1EbP-Tq1IwYO9y1_-zzw23XlPbzrxP0og?usp=sharing

Here, you will be briefly guided through the basics of how to use omicverse to perform wgcna anlysis. Once you are set

import omicverse as ov
ov.utils.ov_plot_set()

/Users/fernandozeng/miniforge3/envs/scbasset/lib/python3.8/site-packages/phate/__init__.py

Load the data

The analysis is based on the in-built WGCNA tutorial data.

import pandas as pd
data=ov.utils.read_csv(filepath_or_buffer='https://raw.githubusercontent.com/Starlitnightly/ov/master/sample/LiverFemale3600.csv',
                           index_col=0)
data.head()

	F2_2	F2_3	F2_14	F2_15	F2_19	F2_20	F2_23	F2_24	F2_26	F2_37	...	F2_324	F2_325	F2_326	F2_327	F2_328	F2_329	F2_330	F2_332	F2_355	F2_357
substanceBXH
---	---	---	---	---	---	---	---	---	---	---	---	---	---	---	---	---	---	---	---	---	---
MMT00000044	-0.01810	0.0642	0.000064	-0.0580	0.04830	-0.151974	-0.00129	-0.23600	-0.0307	-0.02610	...	0.047700	-0.0488	0.0168	-0.0309	0.02740	-0.031	0.0660	-0.0199	-0.0146	0.065000
MMT00000046	-0.07730	-0.0297	0.112000	-0.0589	0.04430	-0.093800	0.09340	0.02690	-0.1330	0.07570	...	-0.049200	-0.0350	-0.0738	-0.1730	-0.07380	-0.201	-0.0820	-0.0939	0.0192	-0.049900
MMT00000051	-0.02260	0.0617	-0.129000	0.0871	-0.11500	-0.065026	0.00249	-0.10200	0.1420	-0.10200	...	0.000612	0.1210	0.0996	0.1090	0.02730	0.120	-0.0629	-0.0395	0.1090	0.000253
MMT00000076	-0.00924	-0.1450	0.028700	-0.0439	0.00425	-0.236100	-0.06900	0.01440	0.0363	-0.01820	...	-0.270000	0.0803	0.0424	0.1610	0.05120	0.241	0.3890	0.0251	-0.0348	0.114000
MMT00000080	-0.04870	0.0582	-0.048300	-0.0371	0.02510	0.085043	0.04450	0.00167	-0.0680	0.00567	...	0.113000	-0.0859	-0.1340	0.0639	0.00731	0.124	-0.0212	0.0870	0.0512	0.024300

5 rows × 135 columns

Correlation matrix calculate

We can calculate the direct correlation matrix by each gene

gene_wgcna=ov.bulk.pyWGCNA(data,save_path='result')
gene_wgcna.calculate_correlation_direct(method='pearson',save=False)

...correlation coefficient matrix is being calculated

In pyWGCNA module, we need to trans the direct correlation matrix to indirect correlation matrix to calculate the soft threshold, Soft thresholds can help us convert the original correlation network into a scale-free network

gene_wgcna.calculate_correlation_indirect(save=False)

...indirect correlation matrix is being calculated


gene_wgcna.calculate_soft_threshold(save=False)

...soft_threshold is being calculated
...appropriate soft_thresholds: 5

	beta	r2	meank
1	1	0.001434	866.510550
2	2	0.223522	373.883499
3	3	0.434748	217.589775
4	4	0.655388	153.295645
5	5	0.889141	123.977608
6	6	0.916398	111.535289
7	7	0.897792	109.220186
8	8	0.852653	114.496535
9	9	0.859802	126.781878
10	10	0.838656	146.668526
11	11	0.838382	175.697242

enter image description here

The left vertical coordinate is the evaluation metric r2 for a scale-free network. the closer r2 is to 1, the closer the network is to a scale-free network, usually requiring >0.8 or 0.9. the right vertical coordinate is the average connectivity, which decreases as beta increases. Combining the two graphs, the beta value is usually chosen when r^2 first reaches 0.8 or 0.9 or more. With the beta value one can convert the correlation matrix into an adjacency matrix according to Eq.

Then we can construct the Topological Overlap Matrix

gene_wgcna.calculate_corr_matrix()

Building a network of co-expressions

We use the dynamicTree to build the co-expressions module

gene_wgcna.calculate_distance()
gene_wgcna.calculate_geneTree()
gene_wgcna.calculate_dynamicMods()
module=gene_wgcna.calculate_gene_module()

...distance have being calculated
...geneTree have being calculated
...dynamicMods have being calculated
..cutHeight not given, setting it to 604.9174547493581  ===>  99% of the (truncated) height range in dendro.
..done.
...total: 15

enter image description here

Here, we successfully calculated each gene's module. There are 15 module, and their color were present in the figures.

module.head()

	ivl	module	name	color
0	2124	8	MMT00049213	#E0DFED
1	1879	8	MMT00043907	#E0DFED
2	2720	8	MMT00063091	#E0DFED
3	2702	8	MMT00062691	#E0DFED
4	879	8	MMT00020288	#E0DFED

gene_wgcna.plot_matrix()

<seaborn.matrix.ClusterGrid at 0x2a6660850>

enter image description here

Sub co-expression module

Sometimes we are interested in a gene, or a module of a pathway, and we need to extract the sub-modules of the gene for analysis and mapping. For example, we have selected two modules, 6 and 12, as sub-modules for analysis

gene_wgcna.get_sub_module([6,12]).shape

(151, 4)

We found a total of 151 genes for 6 and 12. Next, we used the scale-free network constructed earlier, with the threshold set to 0.95, to construct a gene correlation network graph for modules 6 and 12

gene_wgcna.get_sub_network([6,12],correlation_threshold=0.95)

<networkx.classes.graph.Graph at 0x2c1c7bd90>

pyWGCNA provides a simple visualisation function plot_sub_network to visualise the gene-free network of our interest.

gene_wgcna.plot_sub_network([6,12],pos_type='kamada_kawai',pos_scale=3,pos_dim=2,
                         figsize=(8,8),node_size=20,label_fontsize=8,
                        label_bbox={"ec": "white", "fc": "white", "alpha": 0.6})

(<Figure size 640x640 with 1 Axes>, <AxesSubplot: >)

enter image description here

Module and trait associations

In addition to being able to select modules based on target genes, we can also select modules based on specific sample traits. We can calculate the correlation between traits and modules for each sample, and thus find modules with the traits we are interested in.

We read the trait matrix firstly from github. The trait matrix shape must be index is sample and columns is trait. The sample name must be consistent with the sample name of our original data earlier.

meta=ov.utils.read_csv(filepath_or_buffer='https://raw.githubusercontent.com/Starlitnightly/ov/master/sample/character.csv',
                          index_col=0)
meta.head()

	weight_g	length_cm	ab_fat	other_fat
Mice
---	---	---	---	---
F2_2	38.0	10.5	3.81	2.78
F2_3	33.5	10.8	1.70	2.05
F2_14	33.9	10.0	1.29	1.67
F2_15	44.3	10.3	3.62	3.34
F2_19	32.9	9.7	2.08	1.85

Then we use analysis_meta_correlation to calculate the correlation between module and the trait

cor_matrix=gene_wgcna.analysis_meta_correlation(meta)

...PCA analysis have being done
...co-analysis have being done

ax=gene_wgcna.plot_meta_correlation(cor_matrix)

enter image description here

WGCNA RNA-seq • 405 views

ADD COMMENT • link updated 3 months ago by Ram 43k • written 3 months ago by Julia Ma ▴ 120