Comparing Clusters From The Dendrogram Using R Programming?
2
2
Entering edit mode
12.7 years ago
Mkl ▴ 20

Hi all,

I need to compare the clusters of dendrogram by using R program? I did hierarchical clustering(protein sequences) using hclust function and I got 27 clusters. Next I would like to know the names of proteins in each cluster for comparison. Which function do I have to use for this?I used cutree function to cut dendrogram at a particular height.But I don't know how to find the elements of each cluster.I used following code to do Hierarchial clustering.

fit= hclust(as.dist(seq), method = "single") 
plot(fit)
democut=cutree(fit,h=20)
plot(fit, labels = as.character(democut))
table(democut)

How could I get the elements from each cluster and is there any package in R to solve this problem?

r clustering • 11k views
ADD COMMENT
0
Entering edit mode

Add the protein names as rownames and colnames to your matrix seq, then you should be fine.

ADD REPLY
2
Entering edit mode
12.7 years ago
Michael 54k

This is exactly in the documentation, read ?cutree and run example(cutree). In short, run:

hc <- hclust(dist(USArrests))

cutree(hc, k=5) #k = 1 is trivial
# output:
Alabama         Alaska        Arizona       Arkansas     California 
         1              1              1              2              1 
  ...

If you don't get the names, but only numbers, then your input data to hclust lacks names.

ADD COMMENT
1
Entering edit mode

In this case, Alabam, Alaska, Arizona, are the "protein names". The integer is the cluster number.

ADD REPLY
0
Entering edit mode

@Michael . I think this code cuts the dendrogram as five clusters. Actually I know the number of clusters in my dendrogram. My aim is to find the names of members(protein names) in each cluster. Any solution for this?

ADD REPLY
0
Entering edit mode

I tried again this code for my data.But I got only the number of clusters. I didn't get a table like this.My data contains 228 sequences. I cut the dendrogram at h=20 using cutree function.From this I got the number of clusters as 27.

ADD REPLY
0
Entering edit mode

As I said before your input data must have rownames for this to work. Type rownames(yourdata), what do you see? otherwise show us the result of head(yourdata).

ADD REPLY
1
Entering edit mode
12.7 years ago
Stevelor ▴ 310

a more interactive way is:

plot(as.dendrogram(hclust(dist(USArrests)))

x <- identify(hclust(dist(USArrests)))

Use right click to stop..all selected subtrees are printed out... You can use x as a normal R object for further stuff

But it's sometimes a little bit stupid to use :D

HTH!

ADD COMMENT

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6