Question: How to manage self loops in gene coexpression network constructed from RNA-seq data
gravatar for
2.6 years ago by
aishu.jp10 wrote:

I have a control vs treated RNA-seq plant data for which I am trying to construct gene coexpression network.I identifed a total of 6000 genes are significantly differential expressed genes using DESeq2 R package after applying FDR cutoff 0.05.

The normalised count matrix of these 6000 genes derived after rlog transformation was inputed to Cor() function and Pearson correlation was applied. The pair wise correlation analysis gave ~30 million gene pairs out of which, 1380285 gene pairs were selected with a cutoff >0.95 and were visualized using cytoscape

While visualising the network in cytoscape. I observed self loop for all genes in the network.

  1. Is the presence of self loop for all genes is biologically correct or not.
  2. If it's not correct, how to avoid self loops in all genes in the network and retain only the biologically significant one's
rna-seq cytoscape R gene • 912 views
ADD COMMENTlink modified 2.6 years ago by zx87549.9k • written 2.6 years ago by aishu.jp10

Thank you sir for your help.

Do any clustering techniques reduce the gene pairs and self looping

ADD REPLYlink written 2.6 years ago by aishu.jp10

Clustering algorithms will just cluster whatever data you provide. To remove self-loops, you can use the NetworkAnalyzer plugin for Cytoscape or just remove them in your correlation matrix after you generate it.

For example, you could set all perfect correlations to NA or some low value, such that they will be filtered:

             [,1]         [,2]        [,3]        [,4]         [,5]
[1,]  1.000000000 -0.008671749 -0.13205923 -0.12919820  0.005133225
[2,] -0.008671749  1.000000000 -0.04800790  0.16655794 -0.075665340
[3,] -0.132059234 -0.048007902  1.00000000  0.08883567 -0.046017194
[4,] -0.129198197  0.166557935  0.08883567  1.00000000 -0.344062904
[5,]  0.005133225 -0.075665340 -0.04601719 -0.34406290  1.000000000

cormat[cormat==1] <- NA

             [,1]         [,2]        [,3]        [,4]         [,5]
[1,]           NA -0.008671749 -0.13205923 -0.12919820  0.005133225
[2,] -0.008671749           NA -0.04800790  0.16655794 -0.075665340
[3,] -0.132059234 -0.048007902          NA  0.08883567 -0.046017194
[4,] -0.129198197  0.166557935  0.08883567          NA -0.344062904
[5,]  0.005133225 -0.075665340 -0.04601719 -0.34406290           NA
ADD REPLYlink written 2.6 years ago by Kevin Blighe69k

Maybe diag is safer?

diag(cormat) <- NA

Other useful functions: lower.tri, upper.tri

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by zx87549.9k

Good point, zx8754

ADD REPLYlink written 2.6 years ago by Kevin Blighe69k
gravatar for Kevin Blighe
2.6 years ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

By generating a correlation matrix and filtering based on Pearson correlation r>0.95, you are therefore only including positive (not inverse) correlations and, in your final list, there will also be the correlations where each gene is correlated to itself, which would create the self-loops.

You can remove these with the NetworkAnalyzer plugin for Cytoscape.


ADD COMMENTlink written 2.6 years ago by Kevin Blighe69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1284 users visited in the last hour