Question

How to manage self loops in gene coexpression network constructed from RNA-seq data

1

Entering edit mode

5.9 years ago

aishu.jp ▴ 10

I have a control vs treated RNA-seq plant data for which I am trying to construct gene coexpression network.I identifed a total of 6000 genes are significantly differential expressed genes using DESeq2 R package after applying FDR cutoff 0.05.

The normalised count matrix of these 6000 genes derived after rlog transformation was inputed to Cor() function and Pearson correlation was applied. The pair wise correlation analysis gave ~30 million gene pairs out of which, 1380285 gene pairs were selected with a cutoff >0.95 and were visualized using cytoscape

While visualising the network in cytoscape. I observed self loop for all genes in the network.

Is the presence of self loop for all genes is biologically correct or not.
If it's not correct, how to avoid self loops in all genes in the network and retain only the biologically significant one's

RNA-Seq R gene cytoscape • 1.8k views

ADD COMMENT • link updated 5.9 years ago by zx8754 11k • written 5.9 years ago by aishu.jp ▴ 10

0

Entering edit mode

Thank you sir for your help.

Do any clustering techniques reduce the gene pairs and self looping

ADD REPLY • link 5.9 years ago by aishu.jp ▴ 10

2

Entering edit mode

Clustering algorithms will just cluster whatever data you provide. To remove self-loops, you can use the NetworkAnalyzer plugin for Cytoscape or just remove them in your correlation matrix after you generate it.

For example, you could set all perfect correlations to NA or some low value, such that they will be filtered:

cormat
             [,1]         [,2]        [,3]        [,4]         [,5]
[1,]  1.000000000 -0.008671749 -0.13205923 -0.12919820  0.005133225
[2,] -0.008671749  1.000000000 -0.04800790  0.16655794 -0.075665340
[3,] -0.132059234 -0.048007902  1.00000000  0.08883567 -0.046017194
[4,] -0.129198197  0.166557935  0.08883567  1.00000000 -0.344062904
[5,]  0.005133225 -0.075665340 -0.04601719 -0.34406290  1.000000000

cormat[cormat==1] <- NA

cormat
             [,1]         [,2]        [,3]        [,4]         [,5]
[1,]           NA -0.008671749 -0.13205923 -0.12919820  0.005133225
[2,] -0.008671749           NA -0.04800790  0.16655794 -0.075665340
[3,] -0.132059234 -0.048007902          NA  0.08883567 -0.046017194
[4,] -0.129198197  0.166557935  0.08883567          NA -0.344062904
[5,]  0.005133225 -0.075665340 -0.04601719 -0.34406290           NA

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k

2

Entering edit mode

Maybe diag is safer?

diag(cormat) <- NA

Other useful functions: lower.tri, upper.tri

ADD REPLY • link 5.9 years ago by zx8754 11k

0

Entering edit mode

Good point, zx8754

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k

score 1 · Answer 1 · 2018-06-16

By generating a correlation matrix and filtering based on Pearson correlation r>0.95, you are therefore only including positive (not inverse) correlations and, in your final list, there will also be the correlations where each gene is correlated to itself, which would create the self-loops.

You can remove these with the NetworkAnalyzer plugin for Cytoscape.

Kevin