Question

why use partial correlation for gene-gene network construction

0

Entering edit mode

6.9 years ago

moxu ▴ 510

There are gene-gene network constructed using correlation coefficient, mutual information, etc., and there are gene-gene network constructed using partial correlation coefficient.

What are the pros and cons of partial correlation versus correlation based approach? Biologically, do they have different underlying implications?

Thanks!

R gene rna-seq • 2.1k views

ADD COMMENT • link 6.9 years ago by moxu ▴ 510

score 2 · Accepted Answer · 2017-06-14

2

Entering edit mode

6.9 years ago

Jean-Karim Heriche 27k

Correlations and other measures are sensitive to indirect effects, i.e. a high correlation doesn't mean a direct interaction. By removing the influence of other genes, the partial correlation tries to measure the direct relationship between the two genes under consideration. Note that the effects of other genes are assumed to be linear (the partial correlation is the correlation between residuals of two linear regressions) and that the inferred interactions are undirected.

ADD COMMENT • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Your explanation regarding partial correlation makes sense.

Related questions:

Suppose you have two cell lines (normal vs. tumor), and you treat each cell line with varying concentrations of a compound. Your gene count file looks like the following:

Gene control11 control12 treat11 treat12 control21 control22 treat21 treat22
Gene1 ...
Gene2 ...
...
GeneN ...

The first column is the gene name, each other column is a sample. Each row is the count (expression level) for a gene.

When doing DEG analysis, we can use complicated models such as expression(GeneI) = cell_line + compound_concentration + cell_line * compound_concentration. There is no problem with it, and this is powerful.

Now when we use parcor to derive partial correlation coefficients, we simply use the expression matrix above, and ignoring cell_line and compound_concentration. Is there a problem here? -- Don't we lose information of the samples?

Even more complicated: how if we want to find differentially regulated gene-gene networks? i.e. if we are looking for gene-gene sub-networks that only occurs in the tumor samples, how can we find them? A naive way is to divide the samples into two groups: a normal group and a tumor group and derive networks in each group for comparison. But this does not seem to be a decent way.

Your help would be highly appreciated!

ADD REPLY • link 6.9 years ago by moxu ▴ 510

0

Entering edit mode

As I understand it, your data is a 3rd order tensor (i.e. a 3-dimensional array) of genes x cell lines x compound concentration. One option would be to use a tensor factorization method to identify clusters. This could allow for example to identify groups of genes that respond to specific concentration in particular samples. Another approach could be to derive multiple networks and stack their adjacency matrices in 3rd order tensor and proceed again with a factorization to identify relevant modules. See for examples notes from a workshop I taught.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k