I have a data.frame with the correlations between OTUs and genes. These correlations will allow me to construct genomes. This data.frame has 1105854 rows.
var1 var2 corr 1 OTU3978 UniRef90_A0A010P3Z8 0.846 2 OTU4011 UniRef90_A0A010P3Z8 0.855 3 OTU4929 UniRef90_A0A010P3Z8 0.829 4 OTU4317 UniRef90_A0A011P550 0.850 5 OTU4816 UniRef90_A0A011P550 0.807 6 OTU3902 UniRef90_A0A011QPQ2 0.836 7 OTU3339 UniRef90_A0A011RKI6 0.835 8 OTU1359 UniRef90_A0A011RLA7 0.801 9 OTU2085 UniRef90_A0A011RLA7 0.843 10 OTU3542 UniRef90_A0A011RLA7 0.866 11 OTU0473 UniRef90_A0A011TDE1 0.807
I use the igraph library to build a graph object.
Then, I want to extract components of this graph in order to construct genomes : I mean, one component will correspond to one genome.
I tried this command :
It gives me back several components, for example :
> genomes[]  "OTU2417" "UniRef90_A0A076H0Q4" "UniRef90_A0A2E8T3F8"  "UniRef90_G5ZY43"
I check the OTU and the different genes of each component thanks to my OTUs table and thanks to the EMBL-EBI database for the genes. I can determine if each reconstructed genome is meaningful.
I also checked the documentation, and I found many other community detection methods : edge-betweenness, louvain, multi-level ... I would like to know what is the main difference between the command line I used ( which gives me back pretty meaningful components) and these algorithms (which also give me components) ?