Color Clustering Results Basing On Sample Classes In R
3
4
Entering edit mode
10.7 years ago
Luca Beltrame ▴ 240

For a while, I've been using the pvclust package to geenrate dendrograms of samples from microarray data. Recently I got a large data set, divided into 4 classes, and since the sample number is relatively high I need to color the various dendrogram leaves (or at least the labels), depending on the original class (e.g., class 1 would get green leaves/labels).

Notice that the original data file is about 19K lines long over more than 40 samples.

I obtain my dendrogram as

require(pvclust)
mydata <- read.delim("mydata.txt", row.names=1)
library(parallel)
cl  <- makeForkCluster(8)
pvclust_result <- parPvclust(cl=cl, data=as.matrix(mydata), nboot=100)
result_dendrogram <- as.dendrogram(pvclust_result$hclust) classes <- factor(additional_data$class, levels=c("Class1", "Class2", "Class3", "Class4"))


(additional_data is a data.frame with the required information)

The problem is, I've been trying some solutions but neither of these seem to work. Heatplus from the Bioconductor repositories tries to cluster the data despite providing a dendrogram (and it's too slow), the example in the R gallery does not work as it colors the dendrogram groups, and not the original classes.

What is the easiest way to color either dendrogram leaves or just labels if needed according to the factors? (I have a mapping of label name - class that I can use)

EDIT:

This is the version that I used, based on David Quigley's answer:

plot.cluster.colors=function( M, D, colors, force.flat=F, show.labels=T, label.cex=1, ... ){
lookup = data.frame( leaf.name = names(M), color=colors, stringsAsFactors=F )
if(force.flat)
hangval = -1
else
hangval = 0.1
dend_colored = dendrapply(D, color.dendrogram.labels, label.cex=label.cex,   show.labels=show.labels, color.lookup=lookup)
plot(dend_colored, ...)
}

color.dendrogram.labels = function(n, label.cex=1, show.labels=TRUE, color.lookup=NULL){

if(is.leaf(n)){
a = attributes(n)
color = color.lookup$color[ which(color.lookup$leaf.name==a$label)[1] ] if( show.labels ){ cex.val = label.cex lab.color = color } else { cex.val=0.01 lab.color="white" } attr(n, "nodePar") = c(a$nodePar, list(lab.col = lab.color, lab.cex=cex.val, col=color, pch=15, cex=1 ) )
}
n
}

r microarray clustering • 12k views
3
Entering edit mode
10.7 years ago

Funny, I just had to solve this problem. color.dendrogram.labels() is a function passed into dendrapply; you don't call it directly. I couldn't be bothered to get the syntax correct for the function pointer, so I just set my color information as a global variable. Improvements welcome. Call plot.cluster.colors() and pass matrix M and a vector of colors of the same length as the names of M. You can mess with the line that decorates the nodes in color.dendrogram.labels() to get dots, triangles, or nothing at all. An improvement would be another parameter to pass a set of arbitrary labels. You could adapt to pass in a dendrogram if you don't want to recalculate it.

color.dendrogram.labels = function(n){
if(is.leaf(n)){
a = attributes(n)
color = foo.lookup.1$color[ which(foo.lookup.1$leaf.name==a$label)[1] ] if( foo.show.labels.1 ){ cex.val = 1 lab.color = color } else{ cex.val=0.01 lab.color="white" } attr(n, "nodePar") = c(a$nodePar, list(lab.col = lab.color, lab.cex=cex.val, col=color, pch=15, cex=1 ) )
}
n
}

plot.cluster.colors=function( M, colors, force.flat=F, show.labels=T ){
# Hack: sets global environment variable
d = (dist(as.matrix(t(M))))
lookup = data.frame( leaf.name = names(M), color=colors, stringsAsFactors=F )
if(force.flat)
hangval = -1
else
hangval = 0.1
D = as.dendrogram(hclust(d), hang=hangval)
assign("foo.lookup.1", lookup, envir=.GlobalEnv)
assign("foo.show.labels.1", show.labels, envir=.GlobalEnv)
dend_colored = dendrapply(D, color.dendrogram.labels)
plot(dend_colored)
}

0
Entering edit mode

Thank you, it works! I'll be adjusting this but this is exactly what I needed.

0
Entering edit mode

Great. I'll modify my own quick-and-dirty code at some point to use the function parameters correctly.

1
Entering edit mode
10.7 years ago
Naga ▴ 450

One easy way:

use library "ape", get 'Newick' format tree and use iTol to color the branches.

library(ape)
result_phylo <- as.phylo(pvclust_result$hclust) write.tree(result_phylo)  ADD COMMENT 0 Entering edit mode 10.7 years ago It's not true that Heatplus "reclusters" your data if you pass in a dendrogram appropriately (I'm looking at Heatplus v2.0). If you peak inside the annHeatmap2 function(on ~ line 658), you will see this code block, which checks if you've passed in a dendrograms for your rows and columns, in which case it doesn't try to cluster: dendrogram$Row = within(dendrogram$Row, if (!inherits(dendro, "dendrogram")) { dendro = clustfun(distfun(x)) dendro = reorder(as.dendrogram(dendro), rowMeans(x, na.rm=TRUE)) } ) dendrogram$Col = within(dendrogram$Col, if (!inherits(dendro, "dendrogram")) { dendro = clustfun(distfun(t(x))) dendro = reorder(as.dendrogram(dendro), colMeans(x, na.rm=TRUE)) } )  I'm guessing maybe you aren't passing in your dendrograms correctly. I believe you would do it somehow like so: dendros <- list(Row=list(dendro=YOUR.ROW.DENDROGRAM), Col=list(dendro=YOUR.COL.DENDROGRAM) ahm <- annHeatmap2(x, dendrogram=dendros)  ADD COMMENT 0 Entering edit mode I tried doing tha passing as.dendrogram(pvclust_result$hclust) and trying to ignore the rows by setting Row=list(status="no"), but apparently it doesn't: at least, it starts pegging the CPU like crazy and it's not interruptible (and running single-core, whatever it does is anyway too slow).