Finding Centroids Of Clusters Created With Hclust In R
1
2
Entering edit mode
10.3 years ago
Hpk ▴ 60

Hi all,

How to find centroids of clusters created with hclust using R program? any suggestions please?

r off • 24k views
0
Entering edit mode

what is the relation to bioinformatics? hint: use function mean on all vectors in the cluster, gl

0
Entering edit mode

A centroid is always part of the given dataset (vectors). This is not necessarilly true for the mean.

0
Entering edit mode

Maybe your are actually not aiming for a hierarchical clustering method but instead for a partioning method like k-means? In that case the term "centroid of clusters" would be defined and easily accessed.

0
Entering edit mode

@steffi I used hclust function to do hierarchial clustering.I got 50 clusters by using complete linkage method.My aim is to find centroid from each cluster. Is there any package or function in R to find centroid from each cluster?

0
Entering edit mode

peri4n, this is wrong. what you mention is a medoid [http://en.wikipedia.org/wiki/Medoids], a centroid is not necessarily part of the data-set (and in principle only defined in euclidean space)

0
Entering edit mode

Hmm looks like a translation problem. German google says that 'Zentroid' is always part of the dataset. Maybe there is some ambiguity in the community.

0
Entering edit mode

I know some german ;) but don't know what german google says, you might be more precise about what you found. In german wikipedia, zentroid redirects to 'Schwerpunkt' which means 'center of gravity' which is what I think centroid means in the geometrical sense. Given any geometrical figure, the center of gravity is not necessarily part of that figure itself. The use of centroid might be a bit misleading, using e.g. arithmetic or geometric mean might be more precise. For this case I refer to http://en.wikipedia.org/wiki/Centroid#Of_a_finite_set_of_points which fits exactly.

4
Entering edit mode
10.3 years ago

Ok, here it goes: Given your original from cluster one is in a matrix c1 with rows as cases and column as variables :

mycentroid <- colMeans(c1)


or for all 5 clusters using hclust with the USA arrests dataset (this is a bad example because the data is not euclidean):

clusters = cutree(hclust(dist(USArrests)), k=5) # get 5 clusters

# function to find medoid in cluster i
clust.centroid = function(i, dat, clusters) {
ind = (clusters == i)
colMeans(dat[ind,])
}

sapply(unique(clusters), clust.centroid, USArrests, clusters)

[,1]       [,2]   [,3]  [,4]  [,5]
Murder    11.47143   8.214286   5.59  14.2  2.95
Assault  263.50000 173.285714 112.40 336.0 62.70
UrbanPop  69.14286  70.642857  65.60  62.5 53.90
Rape      29.00000  22.842857  17.27  24.0 11.51


For how to compute medoids instead, see here

0
Entering edit mode

0
Entering edit mode

An even easier way to compute exactly what has been shown above, is to simply say:

apply (USArrests, 2, function (x) tapply (x, clusters, mean))

0
Entering edit mode

Sorry i'm new to this, but how can you tell which column are the clusters and which one is the centroid?