Question

Assessing Quality Of Clusters

2

Entering edit mode

11.1 years ago

thanhxle ▴ 20

I am running into problem of assessing quality of clusters. Imagine plotting the data after determining data classes produce this plot. with class 1,2,3 cloud appear separately on 2 distinct sides of the plot like so:

enter image description here

I want to assess the quality of the clusters by examine closeness of points within / between clusters. Initially I thought I would just randomly assign classes to each data point (perform procedure many times) to show that points within class is much closer than it can happen by chance. However, because the plot split into 2 clouds of totally different shapes as illustrated, it makes the task much more difficult. How should I go about doing this?

clustering clustering clustering statistics statistics • 1.5k views

ADD COMMENT • link updated 11.1 years ago by Michael 54k • written 11.1 years ago by thanhxle ▴ 20

score 1 · Answer 1 · 2013-03-27

1

Entering edit mode

11.1 years ago

Alex Reynolds 35k

Perhaps take a look into silhouetting.

ADD COMMENT • link 11.1 years ago by Alex Reynolds 35k

score 0 · Answer 2 · 2013-03-28

A similar question was asked before: Assessing cluster reliability/stability in microarray experiments

If you have real class labels however, then I would use the Rand Index or adjusted Rand Index. Assigning class labels randomly is not a valid approach, because it couldn't proove anything, except that the class labels have no impact on the clustering, which we already know.

Also, read through the wikipedia article on cluster evaluation:http://en.wikipedia.org/wiki/Clustering_algorithm#Evaluation_of_clustering_results most or all of the listed cluster indices are implemented in R. Further, I would like to refer to the criticism of internal clustering methods. So, if there is any external 'gold standard' or biological fact behind your data, you should better use these external categories.