Question: Differences Between Eisen Hierarchical Clustering In Bioconductor/R Versus Heatmap.2?
4
user840 wrote:

I'm comparing two ways of creating heatmaps with dendrograms in R, one with `made4`'s `heatplot` (from bioconductor) and one with `gplots` of `heatmap.2`. The appropriate results depend on the analysis but I'm trying to understand why the defaults are so different, and how to get both functions to give the same result (or highly similar result) so that I understand all the 'blackbox' parameters that go into this.

This is the example data and packages:

``````require(gplots)
data(khan)
data <- as.matrix(khan\$train[1:30,])
``````

Clustering the data with heatmap.2 gives:

``````heatmap.2(data, trace="none")
`````` Using `heatplot` gives:

``````heatplot(data)
`````` very different results and scalings initially. `heatplot` results look more reasonable in this case so I'd like to understand what parameters to feed into `heatmap.2` to get it to do the same, since `heatmap.2` has other advantages/features I'd like to use and because I want to understand the missing ingredients.

`heatplot` uses average linkage with correlation distance so we can feed that into `heatmap.2` to ensure similar clusterings are used (based on: https://stat.ethz.ch/pipermail/bioconductor/2010-August/034757.html)

``````dist.pear <- function(x) as.dist(1-cor(t(x)))
hclust.ave <- function(x) hclust(x, method="average")
heatmap.2(data, trace="none", distfun=dist.pear, hclustfun=hclust.ave)
``````

resulting in: this makes the row-side dendrograms look more similar but the columns are still different and so are the scales. It appears that `heatplot` scales the columns somehow by default that `heatmap.2` doesn't do that by default. If I add a row-scaling to heatmap.2, I get:

``````heatmap.2(data, trace="none", distfun=dist.pear, hclustfun=hclust.ave,scale="row")
`````` which still isn't identical but is closer. How can I reproduce `heatplot`'s results with `heatmap.2`? What are the differences?

edit2: it seems like a key difference is that `heatplot` rescales the data with both rows and columns, using:

``````if (dualScale) {
print(paste("Data (original) range: ", round(range(data),
2), round(range(data), 2)), sep = "")
data <- t(scale(t(data)))
print(paste("Data (scale) range: ", round(range(data),
2), round(range(data), 2)), sep = "")
data <- pmin(pmax(data, zlim), zlim)
print(paste("Data scaled to range: ", round(range(data),
2), round(range(data), 2)), sep = "")
}
``````

this is what I'm trying to import to my call to `heatmap.2`. The reason I like it is because it makes the contrasts larger between the low and high values, whereas just passing `zlim` to `heatmap.2` gets simply ignored. How can I use this 'dual scaling' while preserving the clustering along the columns? All I want is the increased contrast you get with:

`heatplot(..., dualScale=TRUE, scale="none")`

compared with the low contrast you get with:

`heatplot(..., dualScale=FALSE, scale="row")`

any ideas on this?

modified 6.5 years ago by Steve Lianoglou5.0k • written 6.5 years ago by user840
4
Steve Lianoglou5.0k wrote:

I think you've identified the biggest differences between the default behaviors: first and foremost it's this "dual scaling" business, and secondly there are slight differences in the default clustering behavior.

To reproduce the diagram using `heatmap.2` alone:

``````library(gplots)
library(RColorBrewer)

data <- as.matrix(khan\$train[1:30,])

cols <- rev(colorRampPalette(brewer.pal(10, "RdBu"))(256))

## Massaging the data
d <- t(scale(t(data)))         ## "dual scaling"
d <- pmin(pmax(d, -3), 3)      ## Compressing data to max/min +/- 3

## Setup the dendrograms
colv <- as.dendrogram(hclust(as.dist(1 - cor(d)), method="ave"))
rowv <- as.dendrogram(hclust(as.dist(1 - cor(t(d))), method="ave"))

heatmap.2(d, trace="none", Rowv=rowv, Colv=colv, scale="none", col=cols, density.info="none")
``````

This dual scaling maneuver is ... interesting. I typically take a different tack -- whether or not I scale (here, the dual scaling forces you to scale the rows), I perform the clustering on the raw (or scaled w/o compressed) data, and then create a the color map using a compressed version of the data so that the colors aren't washed out in the heatmap, but the clustering pattern is unaffected by the data compression.

I also think that the aheatmap function in NMF package creates nice (annotated) heatmaps.

extremely helpful thanks. still confused about a few things: isn't scaling by rows only exactly what you want for the heatmap (so that genes across samples are comparable), so `scale="row"` but without scaling the columns - i.e. without `t(scale(t(data)))`? i'd prefer to avoid any scaling (except rows which seems reasonable) if it's not needed. Clipping the scales seems legitimate for visualization if it doesn't distort the result. Second, why do you need to manually set the column and row dendrograms here as opposed to just letting `heatmap.2` do it? It seems that `heatmap.2` and `heatplot` yield different column dendrograms. for example `EWS.T14` and `EWS.T1` are typically close to `EWS.6` in `heatmap.2` but far apart in `heatplot`. Why is that?

1

• There is no default answer to "should I scale the rows or the columns" .. if you have enough samples (columns), I can imagine a scenario where scaling the rows would be beneficial (so you see relative changes of the gene across samples) -- for that to make the most sense, you might want some type of reassurance that the expression distribution across your samples look similar (are made to be similar)
• `t(scale(t(data)))` doesn't scale the columns, it scales the rows.
• I'm manually setting the `Rowv` and `Colv` dendrograms here, but you could have rather just passed in the appropriate `hclustfun` and `distfun` functions parameterized correctly.
• With the code I use here, I'm not getting any differences in the ordering of the heatmaps between `heatmap.2` and the original `heatplot` call, so I don't know what you are referring to re: the relative positinos of `EWS.T14`, `EWS.T1`, etc.

Note that you probably want to set `symkey=TRUE` and `symbreaks=TRUE` so your colors are set to be white at 0 -- notice how your "white" color is somewhere between 0 and 1.