Question: How To Get The Clear Values At The Bottom Of A Dendrogram In Clustering In R?
gravatar for grosy
8.2 years ago by
grosy90 wrote:

Dear Friends,

I have huge number of data to cluster in R. But when i try to cluster, all the numbers at the bottom of the dendrogram merges which is very difficult to interpret the values.

clustered data with merged values at the bottom

Could anyone please help me to get rid of this problem to get better visualization of the values at the bottom of the dendrogram in R.

the code

 a <- read.csv("C:\\file.csv", header = TRUE)
 b <-scale(a)
 c <- cor(t(b), method="spearman");
 d <- as.dist(1-c); 
 hr <- hclust(d, method = "complete", members=NULL)
 par(mfrow = c(2, 2)); plot(hr, hang = 0.1); plot(hr, hang = -1)
R clustering programming • 9.7k views
ADD COMMENTlink modified 8.2 years ago by kstamm50 • written 8.2 years ago by grosy90

Could you give us the code you are using and the output it creates so that we can visualise what the problem is?

ADD REPLYlink written 8.2 years ago by Leonor Palmeira3.7k
gravatar for kstamm
8.2 years ago by
kstamm50 wrote:

With so many values your options are to either draw an enormous picture (as in Michael Dondrop's answer) or to skip the picture and use some textual output.

If you pass an argument to the hclust function it can retain the tree data structure and let you have code-access to it. The tree datastructure is a list of left and right elements, each of which has a height parameter and another set of left and right elements. You have to traverse the list with some kind of loop to get at the subclusters. There also exists a function to retrieve all leaf nodes, so you at least will know their order.

Given a height cutoff threshold you could separate this into a reasonable number of subtrees and maybe draw those separately.

I don't have the code available here, but the principle is straightforward. Ask hclust to return the dendrogram and you can dig through it.

At the integrated R help ?hclust there is an example of getting at the ten largest subtrees like so:

hc <- hclust(dist(USArrests)^2, "cen")  
memb <- cutree(hc, k = 10)
ADD COMMENTlink written 8.2 years ago by kstamm50
gravatar for Michael Dondrup
8.2 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Try producing an SVG file and open it with your web browser or graphics program using the svg() function, requires Cairo:

plot(hr, hang=-1, cex=0.5)

This will give non-overlapping labels and looks ok-ish for about 500 data points, for more or less or other width, you can experiment with the values.

ADD COMMENTlink written 8.2 years ago by Michael Dondrup47k

sorry i m new to R. could u please tell me how generate a svg file from my file in csv format?

ADD REPLYlink written 8.2 years ago by grosy90

its showing

null device 1

ADD REPLYlink written 8.2 years ago by grosy90

It has already generated it. The file would be named Rplot001.svg in your documents.

ADD REPLYlink written 4.7 years ago by kanika.15180
gravatar for Leonor Palmeira
8.2 years ago by
Leonor Palmeira3.7k
Liège, Belgium
Leonor Palmeira3.7k wrote:

You are going to have to play with the plot() options to build a larger plot with smaller legends. For instance, have you tried using the 'cex' option in the plot() call?

Also, to help you, the labels on the figure can be retrieved this way: hc$label[hc$order]

ADD COMMENTlink written 8.2 years ago by Leonor Palmeira3.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1590 users visited in the last hour