Question

clustering from the minimum distance

0

Entering edit mode

4.9 years ago

s.kyungyong64 ▴ 40

Hi all,

I have the following distance matrix which is a direct output of some pairwise sequence metric.

> as.dist(df)
       0      1      2      3      4      5      6      7      8
1 1.9356                                                        
2 1.6758 2.8880                                                 
3 1.9664 1.0587 2.4737                                          
4 2.1619 1.2724 2.5110 1.1447                                   
5 1.8347 1.0197 2.1482 1.1709 1.2174                            
6 1.9889 1.0422 2.4029 1.0205 0.3976 1.0199                     
7 0.8700 2.3598 1.4906 1.8574 2.8255 2.4992 2.2814              
8 1.6657 0.5076 2.8697 1.1120 1.3185 1.0617 1.1108 1.9752       
9 1.7172 3.7109 1.9279 3.8676 2.3161 2.1345 2.1262 1.6730 2.7601

I would like to cluster these from the minimum distance to the longest. What I would expect to see is, for instance, 4 and 6 clustered together under a node with 0.3976 as their distance, 1 and 8 with 0.5076 and 0 and 7 with 0.8700. Then, 3 which has a minimum distance to 6 will need to cluster with (4,6).

Although a hierarchical clustering seems to work, I could not define linkage or distance methods which give the most similar output to what I expect. The distance of the outputs often look inflated and does not properly represent the metric.

> hc <- hcluster(as.dist(df), link = "single")
> hc$height
[1] 0.916764 1.285251 1.674859 1.707090 1.759598 1.937444 2.617993 3.196162 3.667614

Would you guys have any suggestions?

clustering R • 620 views

ADD COMMENT • link 4.9 years ago by s.kyungyong64 ▴ 40

0

Entering edit mode

I found out I was supposed to use hclust instead of hcluster :(

ADD REPLY • link 4.9 years ago by s.kyungyong64 ▴ 40

0

Entering edit mode

So, all is okay now?

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k