Question: why spectral clustering show a warning?
1
gravatar for Chaimaa
2.1 years ago by
Chaimaa150
Chaimaa150 wrote:

Dear all,

I'm using spectral clustering method in order to cluster a biological dense graph of around 6021 genes. When i run the code starts working and after that shows this warning:

Calinski-Harabasz Score with gamma= 1 n_clusters= 12 score: 29.6079084919

Calinski-Harabasz Score with gamma= 1 n_clusters= 13 score: 28.0773430337

Calinski-Harabasz Score with gamma= 1 n_clusters= 14 score: 29.2831696856

Calinski-Harabasz Score with gamma= 1 n_clusters= 15 score: 28.8945479231

UserWarning: Graph is not fully connected, spectral embedding may not work as expected. warnings.warn("Graph is not fully connected, spectral embedding"

Note: my input is a symmetric adjacency matrix with 1'0 and 0's, what's this warning mean?

I have read that spectral clustering can work better with a similarity matrix, if so could anyone tell me how to turn this adjacency matrix to a similarity matrix.

I Appreciate any help.

Thanks,

ADD COMMENTlink modified 2.1 years ago by Jean-Karim Heriche19k • written 2.1 years ago by Chaimaa150
2

Apparently your graph is not connected and IMO the warning is because spectral algorithms for layouts are not optimal for not connected graphs.

ADD REPLYlink written 2.1 years ago by Rodrigo140
1

@Rodrigo, what I have to do so? any suggestions!

ADD REPLYlink written 2.1 years ago by Chaimaa150

Well, depends on what format you're using for your graph (edgelist, csv, etc) but a couple of possibilities are, you can download gephi which is a software for displaying graphs in a beautiful way and you can also apply some cluster algorithms (spectral included) and see how the nodes mode while it's clustering (this would be helpful for you to see why not connected graphs don't go well with spectral clustering). Here is the link to gephi's home page.

Another option if you know your way around python at least a bit, would be to use the package networx where you can apply different kinds of layouts and display your graph. Here is networkx home page.

Let me know if you need any more help.

ADD REPLYlink written 2.1 years ago by Rodrigo140
1

Spectral clustering is a graph partitioning algorithm, not a graph layout algorithm.

ADD REPLYlink written 2.1 years ago by Jean-Karim Heriche19k
1
gravatar for Jean-Karim Heriche
2.1 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche19k wrote:

Applying spectral clustering to a graph with multiple connected components will first retrieve these connected components as clusters. Unless this is what you want, it is better to apply spectral clustering to each connected component separately. See this tutorial on spectral clustering.

ADD COMMENTlink written 2.1 years ago by Jean-Karim Heriche19k

@Jean-Karim Heriche, So, I have to retrieve the connected components first? I was thinking because my input is a symmetric adjacency matrix that's why this warning was shown; As I know spectral work with affinity matrix; but, honestly i have no idea in how to translate an adjacency matrix into affinity matrix ???

ADD REPLYlink written 2.1 years ago by Chaimaa150
1

An affinity matrix is a similarity matrix. Your adjacency matrix is an affinity matrix because the edge weights represent similarity between the nodes.

ADD REPLYlink written 2.1 years ago by Jean-Karim Heriche19k

So @Jean-Karim Heriche How to avoid this warning? I have read the tutorial which you share with me. I applied Normalized spectral clustering according to Ng, Jordan, and Weiss (2002) and I used the quality measure of calinski_harabaz but still getting that warning! what's wrong !!!

ADD REPLYlink written 2.1 years ago by Chaimaa150
1

what's wrong

As the message says, your graph is not fully connected or said another way, your graph has several connected components or to put it in plain English, your graph is composed of several subgraphs that are not connected to each other. You should identify what these subgraphs are. It is possible that you have a few singletons (nodes not connected to anything) or a small number of nodes connected to each other but not to the rest of the graph. In those cases, you could simply remove the corresponding nodes, otherwise, you need to split your adjacency matrix into several matrices representing the different subgraphs and then apply clustering to the adjacency matrices of the subgraphs separately.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Jean-Karim Heriche19k

@ Jean-Karim Heriche, I extracted the connected components from my graph, then i run spectral code on each component but again get the same warning? why!!!!

ADD REPLYlink written 2.1 years ago by Chaimaa150

In graph theory, fully connected means that all pairs of nodes are connected by an edge which means in principle no 0 in the adjacency matrix (except on the diagonal). However, this is not required for spectral clustering which is why I interpreted the message as being about connected components. Could it be that it's complaining because there are self-loops, i.e. non-zero values on the diagonal of the adjacency matrix ?

ADD REPLYlink written 2.0 years ago by Jean-Karim Heriche19k
2

In graph theory, what you describe is called _complete graph_. I checked the source code (assuming you are using scikit-learn package, where we can find

if not _graph_is_connected(adjacency):
    warnings.warn("Graph is not fully connected, spectral embedding"
                  " may not work as expected.")

and

def _graph_is_connected(graph):
    """ Return whether the graph is connected (True) or Not (False)

    Parameters
    ----------
    graph : array-like or sparse matrix, shape: (n_samples, n_samples)
        adjacency matrix of the graph, non-zero weight means an edge
        between the nodes

    Returns
    -------
    is_connected : bool
        True means the graph is fully connected and False means not
    """
    if sparse.isspmatrix(graph):
        # sparse graph, find all the connected components
        n_connected_components, _ = connected_components(graph)
        return n_connected_components == 1
    else:
        # dense graph, find all connected components start from node 0
        return _graph_connected_component(graph, 0).sum() == graph.shape[0]

What they call _fully connected_ is simply _connected_. Therefore (if you are using that code) your adjacency matrix has several (at least two) connected components.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Rodrigo140
1

Fully connected graph is often used as synonym for complete graph but my first interpretation of it here as meaning "connected" was correct. So the message indicates that there remains multiple connected components in the graph (or that there's a bug in the software).

ADD REPLYlink written 2.0 years ago by Jean-Karim Heriche19k

@Jean-Karim Heriche, how could i avoid this warning now?? No, i don't have self-loops in my graph

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Chaimaa150

The warning is about your graphs having multiple connected components. I can see three possibilities:
- you still have multiple connected components in what you're processing
- there's a bug in the function that checks for connected components
- there's a bug in your code, e.g. maybe you're not passing what you think to the function.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Jean-Karim Heriche19k

@ Rodrigo, So how could i avoid this warning now? yes I'm using the spectral code on Python by sklearn

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Chaimaa150

How are you extracting the connected components from the matrix? are you then passing each of them as argument to the function?

ADD REPLYlink written 2.0 years ago by Rodrigo140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1395 users visited in the last hour