Question: Community detection in R thanks to the igraph package
0
gravatar for vincentpailler
3 days ago by
vincentpailler50 wrote:

I have a data.frame with the correlations between OTUs and genes. These correlations will allow me to construct genomes. This data.frame has 1105854 rows.

      var1                var2  corr
1  OTU3978 UniRef90_A0A010P3Z8 0.846
2  OTU4011 UniRef90_A0A010P3Z8 0.855
3  OTU4929 UniRef90_A0A010P3Z8 0.829
4  OTU4317 UniRef90_A0A011P550 0.850
5  OTU4816 UniRef90_A0A011P550 0.807
6  OTU3902 UniRef90_A0A011QPQ2 0.836
7  OTU3339 UniRef90_A0A011RKI6 0.835
8  OTU1359 UniRef90_A0A011RLA7 0.801
9  OTU2085 UniRef90_A0A011RLA7 0.843
10 OTU3542 UniRef90_A0A011RLA7 0.866
11 OTU0473 UniRef90_A0A011TDE1 0.807

I use the igraph library to build a graph object.

g<-graph.data.frame(df)

Then, I want to extract components of this graph in order to construct genomes : I mean, one component will correspond to one genome.

I tried this command : genomes<-split(names(V(g)), components(g)$membership)

It gives me back several components, for example :

> genomes[[4]]
[1] "OTU2417"             "UniRef90_A0A076H0Q4" "UniRef90_A0A2E8T3F8"
[4] "UniRef90_G5ZY43"

I check the OTU and the different genes of each component thanks to my OTUs table and thanks to the EMBL-EBI database for the genes. I can determine if each reconstructed genome is meaningful.

I also checked the documentation, and I found many other community detection methods : edge-betweenness, louvain, multi-level ... I would like to know what is the main difference between the command line I used ( which gives me back pretty meaningful components) and these algorithms (which also give me components) ?

Thanks

igraph community network R • 91 views
ADD COMMENTlink modified 3 days ago by Jean-Karim Heriche19k • written 3 days ago by vincentpailler50
2
gravatar for Jean-Karim Heriche
3 days ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche19k wrote:

The igraph components() function gives you the connected components of the graph, i.e. all the subgraphs that are not connected to each other. The other methods are clustering algorithms. They will partition the graph by removing edges according to criterion specific to each algorithm. They are normally better applied to each connected component separately (otherwise, many will just output the connected components).

ADD COMMENTlink modified 3 days ago • written 3 days ago by Jean-Karim Heriche19k

Could the connected components (subgraph) of the graph give me back reconstructed genomes ? In your opinion, I should first extract each subgraph, and then, apply clustering algorithms on each of them in order to "improve" the exactness of the reconstructed genomes?

ADD REPLYlink written 3 days ago by vincentpailler50

In your context, connected components represent groups whose members have no correlation with members of any of the other groups. Whether that meets your requirements for calling a group a genome is for you to decide. However if you want to further partition the connected components (for example you think they represent more than one genome) then you can apply a clustering algorithm to try and reveal further structure. My point was that applying almost any clustering algorithm to the whole graph is pointless because this will return the connected components.

ADD REPLYlink written 3 days ago by Jean-Karim Heriche19k

Thanks for your reply. The biggest component I get is always the first one (whatever the dataset I import) . So I am going to apply clustering on this one.

I use : genomes<-split(names(V(g)), components(g)$membership) , and I extract the first component with big_one<-genomes[[1]] .

Is there a way to get back an igraph object only for this component?

ADD REPLYlink written 3 days ago by vincentpailler50
1

Check the decompose() function.

ADD REPLYlink written 3 days ago by Jean-Karim Heriche19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 546 users visited in the last hour