Question: Community detection in R thanks to the igraph package
gravatar for vincentpailler
11 months ago by
vincentpailler130 wrote:

I have a data.frame with the correlations between OTUs and genes. These correlations will allow me to construct genomes. This data.frame has 1105854 rows.

      var1                var2  corr
1  OTU3978 UniRef90_A0A010P3Z8 0.846
2  OTU4011 UniRef90_A0A010P3Z8 0.855
3  OTU4929 UniRef90_A0A010P3Z8 0.829
4  OTU4317 UniRef90_A0A011P550 0.850
5  OTU4816 UniRef90_A0A011P550 0.807
6  OTU3902 UniRef90_A0A011QPQ2 0.836
7  OTU3339 UniRef90_A0A011RKI6 0.835
8  OTU1359 UniRef90_A0A011RLA7 0.801
9  OTU2085 UniRef90_A0A011RLA7 0.843
10 OTU3542 UniRef90_A0A011RLA7 0.866
11 OTU0473 UniRef90_A0A011TDE1 0.807

I use the igraph library to build a graph object.


Then, I want to extract components of this graph in order to construct genomes : I mean, one component will correspond to one genome.

I tried this command : genomes<-split(names(V(g)), components(g)$membership)

It gives me back several components, for example :

> genomes[[4]]
[1] "OTU2417"             "UniRef90_A0A076H0Q4" "UniRef90_A0A2E8T3F8"
[4] "UniRef90_G5ZY43"

I check the OTU and the different genes of each component thanks to my OTUs table and thanks to the EMBL-EBI database for the genes. I can determine if each reconstructed genome is meaningful.

I also checked the documentation, and I found many other community detection methods : edge-betweenness, louvain, multi-level ... I would like to know what is the main difference between the command line I used ( which gives me back pretty meaningful components) and these algorithms (which also give me components) ?


igraph community network R • 492 views
ADD COMMENTlink modified 11 months ago by Jean-Karim Heriche22k • written 11 months ago by vincentpailler130
gravatar for Jean-Karim Heriche
11 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche22k wrote:

The igraph components() function gives you the connected components of the graph, i.e. all the subgraphs that are not connected to each other. The other methods are clustering algorithms. They will partition the graph by removing edges according to criterion specific to each algorithm. They are normally better applied to each connected component separately (otherwise, many will just output the connected components).

ADD COMMENTlink modified 11 months ago • written 11 months ago by Jean-Karim Heriche22k

Could the connected components (subgraph) of the graph give me back reconstructed genomes ? In your opinion, I should first extract each subgraph, and then, apply clustering algorithms on each of them in order to "improve" the exactness of the reconstructed genomes?

ADD REPLYlink written 11 months ago by vincentpailler130

In your context, connected components represent groups whose members have no correlation with members of any of the other groups. Whether that meets your requirements for calling a group a genome is for you to decide. However if you want to further partition the connected components (for example you think they represent more than one genome) then you can apply a clustering algorithm to try and reveal further structure. My point was that applying almost any clustering algorithm to the whole graph is pointless because this will return the connected components.

ADD REPLYlink written 11 months ago by Jean-Karim Heriche22k

Thanks for your reply. The biggest component I get is always the first one (whatever the dataset I import) . So I am going to apply clustering on this one.

I use : genomes<-split(names(V(g)), components(g)$membership) , and I extract the first component with big_one<-genomes[[1]] .

Is there a way to get back an igraph object only for this component?

ADD REPLYlink written 11 months ago by vincentpailler130

Check the decompose() function.

ADD REPLYlink written 11 months ago by Jean-Karim Heriche22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1011 users visited in the last hour