Question: How does one refer gene gene network from a partial correlation matrix?
1
3.0 years ago by
moxu450
moxu450 wrote:

First, I don't understand what it means by partial correlation matrix. I know correlation matrix, though.

Second, assuming partial correlation matrix is similar to correlation matrix, then how can you refer the connections (edges) between the dots (genes) and how to resolve the directionality?

Third, I am using the R package "parcor". I have read the R manual, some reference papers, but still don't know how to infer a network from the package. Can somebody give a simple tutorial?

rna-seq R gene • 1.4k views
modified 3.0 years ago • written 3.0 years ago by moxu450
3
3.0 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

The partial correlation between X and Y is the correlation between the residuals of X and Y after linear regression to eliminate the effect of confounding factors. For example, assume you have two independent variables W and X and a dependent variable Y. To compute the partial correlation between X and Y, you regress X on W to get the residual rX and regress Y on W to get residual rY. The partial correlation is then the correlation between rX and rY. It allows to answer questions like how much X contributes to Y when the effect of W is removed. As the partial correlation can be viewed as a similarity measure, a partial correlation matrix can be seen as the adjacency matrix of a graph where the partial correlations represent edge weights. Since it's a correlation matrix, it is symmetric. A symmetric adjacency matrix implies an undirected graph so you need additional information to build a directed graph out of it.
As for the practical aspect, once you have your matrix, you can use the graph_from_adjacency_matrix() function from the igraph package to get the coresponding graph structure.

I am working on the network stuff right now. I have a gene expression file like the following

``````gene_id sample1 ... sample9
gene1 123 ... 987
.
.
.
geneN 234 ... 432
``````

Then I use the following R code:

``````library("parcor");
library("igraph");
rn <- ridge.net(t(as.matrix(d)), k = 5);
``````

The problem is that every gene points to itself, i.e. the edges are like the following:

``````IGRAPH D--- 899 899 --
+ edges:
[1]  1-> 1  2-> 2  3-> 3  4-> 4  5-> 5  6-> 6  7-> 7  8-> 8  9-> 9 10->10
[11] 11->11 12->12 13->13 14->14 15->15 16->16 17->17 18->18 19->19 20->20
[21] 21->21 22->22 23->23 24->24 25->25 26->26 27->27 28->28 29->29 30->30
[31] 31->31 32->32 33->33 34->34 35->35 36->36 37->37 38->38 39->39 40->40
[41] 41->41 42->42 43->43 44->44 45->45 46->46 47->47 48->48 49->49 50->50
[51] 51->51 52->52 53->53 54->54 55->55 56->56 57->57 58->58 59->59 60->60
[61] 61->61 62->62 63->63 64->64 65->65 66->66 67->67 68->68 69->69 70->70
[71] 71->71 72->72 73->73 74->74 75->75 76->76 77->77 78->78 79->79 80->80
[81] 81->81 82->82 83->83 84->84 85->85 86->86 87->87 88->88 89->89 90->90
+ ... omitted several edges
``````

Any ideas?

Thanks a lot!

You forgot to declare the graph as weighted and undirected:

``````rng <- graph_from_adjacency_matrix(rn\$pcor, weighted = TRUE, mode = "undirected")
``````

Did as you suggested, and got the following:

``````> rng <- graph_from_adjacency_matrix(rn\$pcor, weighted = TRUE, mode = "undirected")
> rng
IGRAPH U-W- 899 366955 --
+ attr: weight (e/n)
+ edges:
[1] 1-- 1 1-- 2 1-- 3 1-- 4 1-- 6 1-- 7 1-- 8 1-- 9 1--10 1--11 1--12 1--13
[13] 1--14 1--15 1--16 1--17 1--18 1--19 1--20 1--21 1--22 1--23 1--24 1--25
[25] 1--26 1--27 1--28 1--29 1--30 1--31 1--32 1--33 1--34 1--35 1--36 1--37
[37] 1--38 1--39 1--40 1--41 1--42 1--43 1--44 1--45 1--46 1--47 1--48 1--49
[49] 1--50 1--51 1--52 1--53 1--54 1--55 1--56 1--57 1--58 1--59 1--60 1--61
[61] 1--62 1--63 1--64 1--65 1--66 1--69 1--70 1--71 1--72 1--73 1--74 1--75
[73] 1--76 1--77 1--78 1--79 1--80 1--81 1--83 1--85 1--86 1--87 1--88 1--89
[85] 1--90 1--91 1--92 1--93 1--95 1--96 1--97 1--98 1--99
+ ... omitted several edges
``````

There are too many connections -- looks like every gene is connected to every gene. Is there a way to threshold (or validate) a connection?

An edge is created for every non-zero value in the matrix so unless you have a sparse matrix, you get a graph in which each node is connected to most of the others. How to deal with this situation depends on what you want to do with the graph.
You could threshold your similarity values but finding the right threshold may not be easy. The context may suggest a value or you could filter edges based on the p-value or you could use the elbow criterion: plot the values in decreasing order and find the value at which the curve levels off.

How to trim the network, please? I don't see anything in "ridge.net" that is related to trimming, nor in "igraph".

Thanks.

By trimming, do you mean removing edges ? You can do it by setting the corresponding weights to 0 in the adjacency/similarity matrix or using the delete_edges() function in igraph.

Yes, removing the edges. I did set the weights to 0 for most of the edges by setting the weights < threshold value to 0. All vertices still show up on the plot. Guess I need to remove the vertices which have no connections to make the plot clear. Detecting such vertices is conceptually easy but tedious. I am surprised the package "parcor" & "igraph" have so limited number of functions.

Also, the partial correlation matrix can take both + and - numbers. Is there away to reflect the sign on the plot (e.g. using different colors)?

Thanks a lot.

0
3.0 years ago by
moxu450
moxu450 wrote:

Excellent explanation! It makes a lot of sense to me now. Thanks much!

I am reading the paper "A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers". The paper categorizes partial correlation based ridgenet, lassonet, etc. as "regularized methods" and is different from mutual information (MI) based methods, which are not able to infer direction. This somehow gives me the impression that regularized methods can infer direction.

Thanks again for your explanation in details and at least now I should be able to get a network.

1

The partial correlation matrix is derived using the inverted correlation matrix. However, the correlation matrix is not always invertible. This usually happens when p >> n (more variables than samples). In such cases, regularization methods are used to estimate the inverse. This doesn't change the fact that the matrix is symmetric.

For gene gene network, usually we have far fewer samples (dozens at most) than variables (22k genes).

Because the matrix is symmetric so one cannot get directionality? Is there a reasonable way to get directionality, then?

Thanks.