Question: why use partial correlation for gene-gene network construction
0
gravatar for moxu
22 months ago by
moxu430
moxu430 wrote:

There are gene-gene network constructed using correlation coefficient, mutual information, etc., and there are gene-gene network constructed using partial correlation coefficient.

What are the pros and cons of partial correlation versus correlation based approach? Biologically, do they have different underlying implications?

Thanks!

rna-seq R gene • 858 views
ADD COMMENTlink modified 22 months ago • written 22 months ago by moxu430
2
gravatar for Jean-Karim Heriche
22 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

Correlations and other measures are sensitive to indirect effects, i.e. a high correlation doesn't mean a direct interaction. By removing the influence of other genes, the partial correlation tries to measure the direct relationship between the two genes under consideration. Note that the effects of other genes are assumed to be linear (the partial correlation is the correlation between residuals of two linear regressions) and that the inferred interactions are undirected.

ADD COMMENTlink written 22 months ago by Jean-Karim Heriche18k

Your explanation regarding partial correlation makes sense.

Related questions:

Suppose you have two cell lines (normal vs. tumor), and you treat each cell line with varying concentrations of a compound. Your gene count file looks like the following:

Gene control11 control12 treat11 treat12 control21 control22 treat21 treat22
Gene1 ...
Gene2 ...
...
GeneN ...

The first column is the gene name, each other column is a sample. Each row is the count (expression level) for a gene.

When doing DEG analysis, we can use complicated models such as expression(GeneI) = cell_line + compound_concentration + cell_line * compound_concentration. There is no problem with it, and this is powerful.

Now when we use parcor to derive partial correlation coefficients, we simply use the expression matrix above, and ignoring cell_line and compound_concentration. Is there a problem here? -- Don't we lose information of the samples?

Even more complicated: how if we want to find differentially regulated gene-gene networks? i.e. if we are looking for gene-gene sub-networks that only occurs in the tumor samples, how can we find them? A naive way is to divide the samples into two groups: a normal group and a tumor group and derive networks in each group for comparison. But this does not seem to be a decent way.

Your help would be highly appreciated!

ADD REPLYlink modified 22 months ago • written 22 months ago by moxu430

As I understand it, your data is a 3rd order tensor (i.e. a 3-dimensional array) of genes x cell lines x compound concentration. One option would be to use a tensor factorization method to identify clusters. This could allow for example to identify groups of genes that respond to specific concentration in particular samples. Another approach could be to derive multiple networks and stack their adjacency matrices in 3rd order tensor and proceed again with a factorization to identify relevant modules. See for examples notes from a workshop I taught.

ADD REPLYlink written 22 months ago by Jean-Karim Heriche18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1707 users visited in the last hour