How does one refer gene gene network from a partial correlation matrix?
2
1
Entering edit mode
7.1 years ago
moxu ▴ 510

First, I don't understand what it means by partial correlation matrix. I know correlation matrix, though.

Second, assuming partial correlation matrix is similar to correlation matrix, then how can you refer the connections (edges) between the dots (genes) and how to resolve the directionality?

Third, I am using the R package "parcor". I have read the R manual, some reference papers, but still don't know how to infer a network from the package. Can somebody give a simple tutorial?

Thanks much in advance.

R RNA-Seq gene • 3.0k views
ADD COMMENT
4
Entering edit mode
7.1 years ago

The partial correlation between X and Y is the correlation between the residuals of X and Y after linear regression to eliminate the effect of confounding factors. For example, assume you have two independent variables W and X and a dependent variable Y. To compute the partial correlation between X and Y, you regress X on W to get the residual rX and regress Y on W to get residual rY. The partial correlation is then the correlation between rX and rY. It allows to answer questions like how much X contributes to Y when the effect of W is removed. As the partial correlation can be viewed as a similarity measure, a partial correlation matrix can be seen as the adjacency matrix of a graph where the partial correlations represent edge weights. Since it's a correlation matrix, it is symmetric. A symmetric adjacency matrix implies an undirected graph so you need additional information to build a directed graph out of it.
As for the practical aspect, once you have your matrix, you can use the graph_from_adjacency_matrix() function from the igraph package to get the coresponding graph structure.

ADD COMMENT
0
Entering edit mode

I am working on the network stuff right now. I have a gene expression file like the following

gene_id sample1 ... sample9
gene1 123 ... 987
.
.
.
geneN 234 ... 432

Then I use the following R code:

library("parcor");
library("igraph");
d<-read.table("mydata.txt", header = T, sep="\t", row.names = "gene_id");
rn <- ridge.net(t(as.matrix(d)), k = 5);
rng <- graph_from_adjacency_matrix(rn$pcor)

The problem is that every gene points to itself, i.e. the edges are like the following:

IGRAPH D--- 899 899 -- 
+ edges:
 [1]  1-> 1  2-> 2  3-> 3  4-> 4  5-> 5  6-> 6  7-> 7  8-> 8  9-> 9 10->10
[11] 11->11 12->12 13->13 14->14 15->15 16->16 17->17 18->18 19->19 20->20
[21] 21->21 22->22 23->23 24->24 25->25 26->26 27->27 28->28 29->29 30->30
[31] 31->31 32->32 33->33 34->34 35->35 36->36 37->37 38->38 39->39 40->40
[41] 41->41 42->42 43->43 44->44 45->45 46->46 47->47 48->48 49->49 50->50
[51] 51->51 52->52 53->53 54->54 55->55 56->56 57->57 58->58 59->59 60->60
[61] 61->61 62->62 63->63 64->64 65->65 66->66 67->67 68->68 69->69 70->70
[71] 71->71 72->72 73->73 74->74 75->75 76->76 77->77 78->78 79->79 80->80
[81] 81->81 82->82 83->83 84->84 85->85 86->86 87->87 88->88 89->89 90->90
+ ... omitted several edges

Any ideas?

Thanks a lot!

ADD REPLY
0
Entering edit mode

You forgot to declare the graph as weighted and undirected:

rng <- graph_from_adjacency_matrix(rn$pcor, weighted = TRUE, mode = "undirected")
ADD REPLY
0
Entering edit mode

Did as you suggested, and got the following:

> rng <- graph_from_adjacency_matrix(rn$pcor, weighted = TRUE, mode = "undirected")
> rng
IGRAPH U-W- 899 366955 -- 
+ attr: weight (e/n)
+ edges:
 [1] 1-- 1 1-- 2 1-- 3 1-- 4 1-- 6 1-- 7 1-- 8 1-- 9 1--10 1--11 1--12 1--13
[13] 1--14 1--15 1--16 1--17 1--18 1--19 1--20 1--21 1--22 1--23 1--24 1--25
[25] 1--26 1--27 1--28 1--29 1--30 1--31 1--32 1--33 1--34 1--35 1--36 1--37
[37] 1--38 1--39 1--40 1--41 1--42 1--43 1--44 1--45 1--46 1--47 1--48 1--49
[49] 1--50 1--51 1--52 1--53 1--54 1--55 1--56 1--57 1--58 1--59 1--60 1--61
[61] 1--62 1--63 1--64 1--65 1--66 1--69 1--70 1--71 1--72 1--73 1--74 1--75
[73] 1--76 1--77 1--78 1--79 1--80 1--81 1--83 1--85 1--86 1--87 1--88 1--89
[85] 1--90 1--91 1--92 1--93 1--95 1--96 1--97 1--98 1--99
+ ... omitted several edges

There are too many connections -- looks like every gene is connected to every gene. Is there a way to threshold (or validate) a connection?

ADD REPLY
0
Entering edit mode

An edge is created for every non-zero value in the matrix so unless you have a sparse matrix, you get a graph in which each node is connected to most of the others. How to deal with this situation depends on what you want to do with the graph.
You could threshold your similarity values but finding the right threshold may not be easy. The context may suggest a value or you could filter edges based on the p-value or you could use the elbow criterion: plot the values in decreasing order and find the value at which the curve levels off.

ADD REPLY
0
Entering edit mode

How to trim the network, please? I don't see anything in "ridge.net" that is related to trimming, nor in "igraph".

Thanks.

ADD REPLY
0
Entering edit mode

By trimming, do you mean removing edges ? You can do it by setting the corresponding weights to 0 in the adjacency/similarity matrix or using the delete_edges() function in igraph.

ADD REPLY
0
Entering edit mode

Yes, removing the edges. I did set the weights to 0 for most of the edges by setting the weights < threshold value to 0. All vertices still show up on the plot. Guess I need to remove the vertices which have no connections to make the plot clear. Detecting such vertices is conceptually easy but tedious. I am surprised the package "parcor" & "igraph" have so limited number of functions.

Also, the partial correlation matrix can take both + and - numbers. Is there away to reflect the sign on the plot (e.g. using different colors)?

Thanks a lot.

ADD REPLY
0
Entering edit mode
7.1 years ago
moxu ▴ 510

Excellent explanation! It makes a lot of sense to me now. Thanks much!

I am reading the paper "A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers". The paper categorizes partial correlation based ridgenet, lassonet, etc. as "regularized methods" and is different from mutual information (MI) based methods, which are not able to infer direction. This somehow gives me the impression that regularized methods can infer direction.

Thanks again for your explanation in details and at least now I should be able to get a network.

ADD COMMENT
1
Entering edit mode

The partial correlation matrix is derived using the inverted correlation matrix. However, the correlation matrix is not always invertible. This usually happens when p >> n (more variables than samples). In such cases, regularization methods are used to estimate the inverse. This doesn't change the fact that the matrix is symmetric.

ADD REPLY
0
Entering edit mode

For gene gene network, usually we have far fewer samples (dozens at most) than variables (22k genes).

Because the matrix is symmetric so one cannot get directionality? Is there a reasonable way to get directionality, then?

Thanks.

ADD REPLY
0
Entering edit mode

First it depends on what you want the direction to mean (e.g. causation, regulation...). Second, it depends on what kind of data you have. For example, it is difficult to infer anything without information about the dynamics of the system (i.e. time series measurements). So you need additional information that is relevant to the type of directionality you want. There are plenty of papers about gene regulatory network inferrence.

ADD REPLY

Login before adding your answer.

Traffic: 2845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6