Question: How To Do Clustering Based On Cutoff Distance Value In R
0
gravatar for grosy
7.3 years ago by
grosy80
grosy80 wrote:

I have a matrix similarity matrix as follows. Now i want to do hierarchical clustering by keeping some distance cutoff like if it is 2. the hierarchical clustering should be take the distance value 2 and cluster the sample A,B,C and D accordingly.

I tried doing in R. but couldn't set the threshold value. Could anyone please help me the solve the problem in R.

  A B C D 
A|0|1|2|3 
B|1|0|3|5     
C|2|3|0|1
D|3|5|1|0

Thanks in advance,

R clustering • 9.2k views
ADD COMMENTlink written 7.3 years ago by grosy80
4
gravatar for seidel
7.3 years ago by
seidel6.9k
United States
seidel6.9k wrote:

I'm not quite sure I understand your question, that is, clustering puts things together based on the distances between them, and then generates a tree showing the distances. One usually cuts the tree to define clusters based on a given height (perhaps this is what you mean by cutoff?). However, as worded "clustering based on cutoff distance" doesn't make sense to me (perhaps you could explain further?).

If you want to cluster your data, and then take things which cluster below a distance of two, you could do the following in R:

# Create a sample data set.
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))

# hierarchical cluster using Euclidean distance to get a range of distances
# for which the integer 2 is relevant
hr <- hclust(dist(y), method = "complete", members=NULL)

# examine the dendrogram
plot(hr)

# cut at a distance of 2, and get cluster memberships
myhcl <- cutree(hr, h=2)

# highlight our clusters on the dendrogram
rect.hclust(hr, h=2)

Using some other distance metric (i.e. correlation) you could choose a custom cutoff distance after examining the dendrogram.

ADD COMMENTlink written 7.3 years ago by seidel6.9k

yes ... u understood it correctly.. i.e. cuts the tree to define clusters based on a given height ... thanks i ll try the commands n let u know

ADD REPLYlink written 7.3 years ago by grosy80

Thanks seidel it works... but getting one error when i set the cutoff of 0.5. the error is "Error in rect.hclust(hr, h = 0.5) : k must be between 2 and 377"

ADD REPLYlink written 7.3 years ago by grosy80

What is your basis for choosing 0.5? Given that error, if you look at your dendrogram, it's likely that none of the distances are less than 0.5. You can see them directly by examining hr$height. You might try range(hr$height). You can't choose to cut lower than the smallest distance. What is your range of distances?

ADD REPLYlink written 7.3 years ago by seidel6.9k

thanks @seidel... but the matrix does contain some values which is within the range of 0.5.

ADD REPLYlink written 7.3 years ago by grosy80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 966 users visited in the last hour