Question: Comparing Clusters From The Dendrogram Using R Programming?
2
gravatar for Mkl
9.1 years ago by
Mkl20
Mkl20 wrote:

Hi all,

I need to compare the clusters of dendrogram by using R program? I did hierarchical clustering(protein sequences) using hclust function and I got 27 clusters. Next I would like to know the names of proteins in each cluster for comparison. Which function do I have to use for this?I used cutree function to cut dendrogram at a particular height.But I don't know how to find the elements of each cluster.I used following code to do Hierarchial clustering.

fit= hclust(as.dist(seq), method = "single") 
plot(fit)
democut=cutree(fit,h=20)
plot(fit, labels = as.character(democut))
table(democut)

How could I get the elements from each cluster and is there any package in R to solve this problem?

R clustering • 9.0k views
ADD COMMENTlink modified 9.1 years ago by Stevelor310 • written 9.1 years ago by Mkl20

Add the protein names as rownames and colnames to your matrix seq, then you should be fine.

ADD REPLYlink written 9.1 years ago by Michael Dondrup47k
2
gravatar for Michael Dondrup
9.1 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

This is exactly in the documentation, read ?cutree and run example(cutree). In short, run:

hc <- hclust(dist(USArrests))

cutree(hc, k=5) #k = 1 is trivial
# output:
Alabama         Alaska        Arizona       Arkansas     California 
         1              1              1              2              1 
  ...

If you don't get the names, but only numbers, then your input data to hclust lacks names.

ADD COMMENTlink written 9.1 years ago by Michael Dondrup47k
1

In this case, Alabam, Alaska, Arizona, are the "protein names". The integer is the cluster number.

ADD REPLYlink written 9.1 years ago by Michael Dondrup47k

@Michael . I think this code cuts the dendrogram as five clusters. Actually I know the number of clusters in my dendrogram. My aim is to find the names of members(protein names) in each cluster. Any solution for this?

ADD REPLYlink written 9.1 years ago by Mkl20

I tried again this code for my data.But I got only the number of clusters. I didn't get a table like this.My data contains 228 sequences. I cut the dendrogram at h=20 using cutree function.From this I got the number of clusters as 27.

ADD REPLYlink written 9.1 years ago by Mkl20

As I said before your input data must have rownames for this to work. Type rownames(yourdata), what do you see? otherwise show us the result of head(yourdata).

ADD REPLYlink written 9.1 years ago by Michael Dondrup47k
1
gravatar for Stevelor
9.1 years ago by
Stevelor310
Stevelor310 wrote:

a more interactive way is:

plot(as.dendrogram(hclust(dist(USArrests)))

x <- identify(hclust(dist(USArrests)))

Use right click to stop..all selected subtrees are printed out... You can use x as a normal R object for further stuff

But it's sometimes a little bit stupid to use :D

HTH!

ADD COMMENTlink written 9.1 years ago by Stevelor310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 718 users visited in the last hour