Question: Networks: Fastest Way To Define Cliques And Connectors Between Them For Large Datasets
4
gravatar for User 8257
7.1 years ago by
User 825740
User 825740 wrote:

Hi everyone,

I'm looking for the fastest way to define cliques (communities) and edges that connect between them on large datasets (10^5-10^6 nodes). So far I've tried Cytoscape (it chokes) and R's igraph library (it doesn't choke, but it's quite slow). Would appreciate your advice on the best ways to do this in R - as well as alternative tools.

Many thanks!

R network graph • 2.6k views
ADD COMMENTlink modified 7.1 years ago by Darren J. Fitzpatrick1.1k • written 7.1 years ago by User 825740
7
gravatar for David Quigley
7.1 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

Finding maximal cliques is NP-complete, meaning that the time to complete the task increases so quickly that it doesn't take many nodes to make the computation impossible to complete before the heat death of the universe. Graph theory is a big literature and there's a lot of specialist work on approximating this problem, but in practice I don't think exact solutions are really required in bioinformatics analysis. I often compromise by finding all cliques of size three or four, which is tractable even at fairly large network sizes and it's a practical way to enrich for nodes that are part of larger cliques or near-cliques. Another heuristic method is to repeatedly select for highly connected nodes, as that will give you near-cliques.

I like the networkx python library for working with cliques and for working with graphs in general. There's also a graph library in the boost libraries if you speak C++; more work for the programmer than python but very fast.

ADD COMMENTlink written 7.1 years ago by David Quigley11k

Thanks very much, very helpful!

ADD REPLYlink written 7.1 years ago by User 825740
7
gravatar for Darren J. Fitzpatrick
7.1 years ago by
Ireland/ United Kingdom
Darren J. Fitzpatrick1.1k wrote:

Hi,

Cliques and communities are somewhat different. A clique is a maximally connected subgraph whereas communities are groupings/clusters in your network, defined by edge densities or some other metric.

Moses and/or CFinder should help you find community structure. In biological networks, overlapping communities are perhaps more realistic but this depends on your hypothesis. Also, if you are interested in finding particualr motifs or instances of cliques, e.g., all maximally connected 4 node groups, Graphgrep is useful.

D.

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Darren J. Fitzpatrick1.1k

Thanks a lot for your advice on the tools - and for pointing out the difference between cliques and communities!

ADD REPLYlink written 7.1 years ago by User 825740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1539 users visited in the last hour