Question: Color Clustering Results Basing On Sample Classes In R
4
gravatar for Luca Beltrame
7.2 years ago by
Luca Beltrame210
IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Milano, Italy
Luca Beltrame210 wrote:

For a while, I've been using the pvclust package to geenrate dendrograms of samples from microarray data. Recently I got a large data set, divided into 4 classes, and since the sample number is relatively high I need to color the various dendrogram leaves (or at least the labels), depending on the original class (e.g., class 1 would get green leaves/labels).

Notice that the original data file is about 19K lines long over more than 40 samples.

I obtain my dendrogram as

require(pvclust)
mydata <- read.delim("mydata.txt", row.names=1)
library(parallel)
cl  <- makeForkCluster(8)
pvclust_result <- parPvclust(cl=cl, data=as.matrix(mydata), nboot=100)
result_dendrogram <- as.dendrogram(pvclust_result$hclust)
classes <- factor(additional_data$class, levels=c("Class1", "Class2", "Class3", "Class4"))

(additional_data is a data.frame with the required information)

The problem is, I've been trying some solutions but neither of these seem to work. Heatplus from the Bioconductor repositories tries to cluster the data despite providing a dendrogram (and it's too slow), the example in the R gallery does not work as it colors the dendrogram groups, and not the original classes.

What is the easiest way to color either dendrogram leaves or just labels if needed according to the factors? (I have a mapping of label name - class that I can use)

Thanks in advance.

EDIT:

This is the version that I used, based on David Quigley's answer:

plot.cluster.colors=function( M, D, colors, force.flat=F, show.labels=T, label.cex=1, ... ){
     lookup = data.frame( leaf.name = names(M), color=colors, stringsAsFactors=F )
     if(force.flat)
        hangval = -1
     else
        hangval = 0.1  
     dend_colored = dendrapply(D, color.dendrogram.labels, label.cex=label.cex,   show.labels=show.labels, color.lookup=lookup) 
     plot(dend_colored, ...)
}


color.dendrogram.labels = function(n, label.cex=1, show.labels=TRUE, color.lookup=NULL){

    if(is.leaf(n)){
        a = attributes(n)
        color = color.lookup$color[ which(color.lookup$leaf.name==a$label)[1] ]
        if( show.labels ){
            cex.val = label.cex
            lab.color = color
        }
        else
        {
            cex.val=0.01
            lab.color="white"   
        }
        attr(n, "nodePar") = c(a$nodePar, list(lab.col = lab.color, lab.cex=cex.val, col=color, pch=15, cex=1 ) ) 
    }
    n
}
R microarray clustering • 9.5k views
ADD COMMENTlink modified 7.2 years ago by Steve Lianoglou5.0k • written 7.2 years ago by Luca Beltrame210
3
gravatar for David Quigley
7.2 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

Funny, I just had to solve this problem. color.dendrogram.labels() is a function passed into dendrapply; you don't call it directly. I couldn't be bothered to get the syntax correct for the function pointer, so I just set my color information as a global variable. Improvements welcome. Call plot.cluster.colors() and pass matrix M and a vector of colors of the same length as the names of M. You can mess with the line that decorates the nodes in color.dendrogram.labels() to get dots, triangles, or nothing at all. An improvement would be another parameter to pass a set of arbitrary labels. You could adapt to pass in a dendrogram if you don't want to recalculate it.

color.dendrogram.labels = function(n){
    if(is.leaf(n)){
        a = attributes(n)
        color = foo.lookup.1$color[ which(foo.lookup.1$leaf.name==a$label)[1] ]
        if( foo.show.labels.1 ){
            cex.val = 1
            lab.color = color
        }
        else{
            cex.val=0.01
            lab.color="white"   
        }
        attr(n, "nodePar") = c(a$nodePar, list(lab.col = lab.color, lab.cex=cex.val, col=color, pch=15, cex=1 ) ) 
    }
    n
}

plot.cluster.colors=function( M, colors, force.flat=F, show.labels=T ){
    # Hack: sets global environment variable 
    d = (dist(as.matrix(t(M))))
    lookup = data.frame( leaf.name = names(M), color=colors, stringsAsFactors=F )
    if(force.flat)
        hangval = -1
    else
        hangval = 0.1
    D = as.dendrogram(hclust(d), hang=hangval)
    assign("foo.lookup.1", lookup, envir=.GlobalEnv)
    assign("foo.show.labels.1", show.labels, envir=.GlobalEnv)    
    dend_colored = dendrapply(D, color.dendrogram.labels) 
    plot(dend_colored)
}
ADD COMMENTlink written 7.2 years ago by David Quigley11k

Thank you, it works! I'll be adjusting this but this is exactly what I needed.

ADD REPLYlink written 7.2 years ago by Luca Beltrame210

Great. I'll modify my own quick-and-dirty code at some point to use the function parameters correctly.

ADD REPLYlink written 7.2 years ago by David Quigley11k
1
gravatar for Naga
7.2 years ago by
Naga450
Germany
Naga450 wrote:

One easy way:

use library "ape", get 'Newick' format tree and use iTol to color the branches.

library(ape)
result_phylo <- as.phylo(pvclust_result$hclust)
write.tree(result_phylo)
ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by Naga450
0
gravatar for Steve Lianoglou
7.2 years ago by
Steve Lianoglou5.0k
US
Steve Lianoglou5.0k wrote:

It's not true that Heatplus "reclusters" your data if you pass in a dendrogram appropriately (I'm looking at Heatplus v2.0).

If you peak inside the annHeatmap2 function(on ~ line 658), you will see this code block, which checks if you've passed in a dendrograms for your rows and columns, in which case it doesn't try to cluster:

dendrogram$Row = within(dendrogram$Row, 
    if (!inherits(dendro, "dendrogram")) {
        dendro = clustfun(distfun(x))
        dendro = reorder(as.dendrogram(dendro), rowMeans(x, na.rm=TRUE))
    }
)
dendrogram$Col = within(dendrogram$Col, 
    if (!inherits(dendro, "dendrogram")) {
        dendro = clustfun(distfun(t(x)))
        dendro = reorder(as.dendrogram(dendro), colMeans(x, na.rm=TRUE))
    }
)

I'm guessing maybe you aren't passing in your dendrograms correctly. I believe you would do it somehow like so:

dendros <- list(Row=list(dendro=YOUR.ROW.DENDROGRAM),
                Col=list(dendro=YOUR.COL.DENDROGRAM)
ahm <- annHeatmap2(x, dendrogram=dendros)
ADD COMMENTlink written 7.2 years ago by Steve Lianoglou5.0k

I tried doing tha passing as.dendrogram(pvclust_result$hclust) and trying to ignore the rows by setting Row=list(status="no"), but apparently it doesn't: at least, it starts pegging the CPU like crazy and it's not interruptible (and running single-core, whatever it does is anyway too slow).

ADD REPLYlink written 7.2 years ago by Luca Beltrame210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 773 users visited in the last hour