Question: How to manage self loops in gene coexpression network constructed from RNA-seq data
gravatar for
4 weeks ago by
aishu.jp10 wrote:

I have a control vs treated RNA-seq plant data for which I am trying to construct gene coexpression network.I identifed a total of 6000 genes are significantly differential expressed genes using DESeq2 R package after applying FDR cutoff 0.05.

The normalised count matrix of these 6000 genes derived after rlog transformation was inputed to Cor() function and Pearson correlation was applied. The pair wise correlation analysis gave ~30 million gene pairs out of which, 1380285 gene pairs were selected with a cutoff >0.95 and were visualized using cytoscape

While visualising the network in cytoscape. I observed self loop for all genes in the network.

  1. Is the presence of self loop for all genes is biologically correct or not.
  2. If it's not correct, how to avoid self loops in all genes in the network and retain only the biologically significant one's
rna-seq cytoscape R gene • 161 views
ADD COMMENTlink modified 4 weeks ago by zx87544.7k • written 4 weeks ago by aishu.jp10

Thank you sir for your help.

Do any clustering techniques reduce the gene pairs and self looping

ADD REPLYlink written 4 weeks ago by aishu.jp10

Clustering algorithms will just cluster whatever data you provide. To remove self-loops, you can use the NetworkAnalyzer plugin for Cytoscape or just remove them in your correlation matrix after you generate it.

For example, you could set all perfect correlations to NA or some low value, such that they will be filtered:

             [,1]         [,2]        [,3]        [,4]         [,5]
[1,]  1.000000000 -0.008671749 -0.13205923 -0.12919820  0.005133225
[2,] -0.008671749  1.000000000 -0.04800790  0.16655794 -0.075665340
[3,] -0.132059234 -0.048007902  1.00000000  0.08883567 -0.046017194
[4,] -0.129198197  0.166557935  0.08883567  1.00000000 -0.344062904
[5,]  0.005133225 -0.075665340 -0.04601719 -0.34406290  1.000000000

cormat[cormat==1] <- NA

             [,1]         [,2]        [,3]        [,4]         [,5]
[1,]           NA -0.008671749 -0.13205923 -0.12919820  0.005133225
[2,] -0.008671749           NA -0.04800790  0.16655794 -0.075665340
[3,] -0.132059234 -0.048007902          NA  0.08883567 -0.046017194
[4,] -0.129198197  0.166557935  0.08883567          NA -0.344062904
[5,]  0.005133225 -0.075665340 -0.04601719 -0.34406290           NA
ADD REPLYlink written 4 weeks ago by Kevin Blighe24k

Maybe diag is safer?

diag(cormat) <- NA

Other useful functions: lower.tri, upper.tri

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by zx87544.7k

Good point, zx8754

ADD REPLYlink written 4 weeks ago by Kevin Blighe24k
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe24k
Republic of Ireland
Kevin Blighe24k wrote:

By generating a correlation matrix and filtering based on Pearson correlation r>0.95, you are therefore only including positive (not inverse) correlations and, in your final list, there will also be the correlations where each gene is correlated to itself, which would create the self-loops.

You can remove these with the NetworkAnalyzer plugin for Cytoscape.


ADD COMMENTlink written 4 weeks ago by Kevin Blighe24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour