Finding Centroids Of Clusters Created With Hclust In R
1
2
Entering edit mode
12.5 years ago
Hpk ▴ 60

Hi all,

How to find centroids of clusters created with hclust using R program? any suggestions please?

r off • 27k views
ADD COMMENT
0
Entering edit mode

what is the relation to bioinformatics? hint: use function mean on all vectors in the cluster, gl

ADD REPLY
0
Entering edit mode

A centroid is always part of the given dataset (vectors). This is not necessarilly true for the mean.

ADD REPLY
0
Entering edit mode

Maybe your are actually not aiming for a hierarchical clustering method but instead for a partioning method like k-means? In that case the term "centroid of clusters" would be defined and easily accessed.

ADD REPLY
0
Entering edit mode

@steffi I used hclust function to do hierarchial clustering.I got 50 clusters by using complete linkage method.My aim is to find centroid from each cluster. Is there any package or function in R to find centroid from each cluster?

ADD REPLY
0
Entering edit mode

peri4n, this is wrong. what you mention is a medoid [http://en.wikipedia.org/wiki/Medoids], a centroid is not necessarily part of the data-set (and in principle only defined in euclidean space)

ADD REPLY
0
Entering edit mode

Hmm looks like a translation problem. German google says that 'Zentroid' is always part of the dataset. Maybe there is some ambiguity in the community.

ADD REPLY
0
Entering edit mode

I know some german ;) but don't know what german google says, you might be more precise about what you found. In german wikipedia, zentroid redirects to 'Schwerpunkt' which means 'center of gravity' which is what I think centroid means in the geometrical sense. Given any geometrical figure, the center of gravity is not necessarily part of that figure itself. The use of centroid might be a bit misleading, using e.g. arithmetic or geometric mean might be more precise. For this case I refer to http://en.wikipedia.org/wiki/Centroid#Of_a_finite_set_of_points which fits exactly.

ADD REPLY
4
Entering edit mode
12.5 years ago
Michael 54k

Ok, here it goes: Given your original from cluster one is in a matrix c1 with rows as cases and column as variables :

mycentroid <- colMeans(c1)

or for all 5 clusters using hclust with the USA arrests dataset (this is a bad example because the data is not euclidean):

clusters = cutree(hclust(dist(USArrests)), k=5) # get 5 clusters

# function to find medoid in cluster i
clust.centroid = function(i, dat, clusters) {
    ind = (clusters == i)
    colMeans(dat[ind,])
}

sapply(unique(clusters), clust.centroid, USArrests, clusters)

              [,1]       [,2]   [,3]  [,4]  [,5]
Murder    11.47143   8.214286   5.59  14.2  2.95
Assault  263.50000 173.285714 112.40 336.0 62.70
UrbanPop  69.14286  70.642857  65.60  62.5 53.90
Rape      29.00000  22.842857  17.27  24.0 11.51

For how to compute medoids instead, see here

ADD COMMENT
0
Entering edit mode

Thank you very much for your useful answer.

ADD REPLY
0
Entering edit mode

An even easier way to compute exactly what has been shown above, is to simply say:

apply (USArrests, 2, function (x) tapply (x, clusters, mean))
ADD REPLY
0
Entering edit mode

Sorry i'm new to this, but how can you tell which column are the clusters and which one is the centroid?

ADD REPLY

Login before adding your answer.

Traffic: 2976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6