8.1 years ago by

United States

I'm not quite sure I understand your question, that is, clustering puts things together based on the distances between them, and then generates a tree showing the distances. One usually cuts the tree to define clusters based on a given height (perhaps this is what you mean by cutoff?). However, as worded "clustering based on cutoff distance" doesn't make sense to me (perhaps you could explain further?).

If you want to cluster your data, and then take things which cluster below a distance of two, you could do the following in R:

```
# Create a sample data set.
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
# hierarchical cluster using Euclidean distance to get a range of distances
# for which the integer 2 is relevant
hr <- hclust(dist(y), method = "complete", members=NULL)
# examine the dendrogram
plot(hr)
# cut at a distance of 2, and get cluster memberships
myhcl <- cutree(hr, h=2)
# highlight our clusters on the dendrogram
rect.hclust(hr, h=2)
```

Using some other distance metric (i.e. correlation) you could choose a custom cutoff distance after examining the dendrogram.