7.1 years ago by

San Francisco

Finding maximal cliques is NP-complete, meaning that the time to complete the task increases so quickly that it doesn't take many nodes to make the computation impossible to complete before the heat death of the universe. Graph theory is a big literature and there's a lot of specialist work on approximating this problem, but in practice I don't think exact solutions are really required in bioinformatics analysis. I often compromise by finding all cliques of size three or four, which is tractable even at fairly large network sizes and it's a practical way to enrich for nodes that are part of larger cliques or near-cliques. Another heuristic method is to repeatedly select for highly connected nodes, as that will give you near-cliques.

I like the networkx python library for working with cliques and for working with graphs in general. There's also a graph library in the boost libraries if you speak C++; more work for the programmer than python but very fast.