Question

Clustering differences between heatmap.2 and pheatmap

4

Entering edit mode

8.8 years ago

igor 13k

I have been using heatmap.2 for a while, but just discovered pheatmap. In heatmap.2, you can specify clustering settings via distfun and hclustfun. In pheatmap, you have clustering_distance_rows and clustering_method. However, if I set those parameters to use the same algorithms, the resulting heatmaps do not look similar. How can that be? Does pheatmap perform additional manipulations that heatmap.2 does not?

My code:

# pheatmap
pheatmap(vals, color=colors, scale="row", cluster_rows=T, cluster_cols=T, clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean", clustering_method = "complete")

# heatmap.2
hclust_fun = function(x) hclust(x, method="complete")
dist_fun = function(x) dist(x, method="euclidean")
heatmap.2( as.matrix(vals), scale="row", trace="none", dendrogram="both", Rowv=TRUE, Colv=TRUE, distfun=dist_fun, hclustfun=hclust_fun, col=colors)

heatmap heatmap.2 pheatmap R • 16k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.8 years ago by igor 13k

4

Entering edit mode

8.5 years ago

informatics bot ▴ 760

heatmap.2 applies some reordering to the dendrogram that is not done by pheatmap. Here is an excerpt from heatmap.2 manual:

If either is a vector (of "weights") then the appropriate dendrogram is reordered according to the supplied values subject to the constraints imposed by the dendrogram, by reorder(dd, Rowv), in the row case. If either is missing, as by default, then the ordering of the corresponding dendrogram is by the mean value of the rows/columns, i.e., in the case of rows, Rowv <- rowMeans(x, na.rm=na.rm). If either is NULL, no reordering will be done for the corresponding side.

I decided to specify the clustering method for both rows and columns in heatmap.2

pdf("heatmap.pdf",width=10)
distance.row = dist(as.matrix(vals), method = "euclidean")
cluster.row = hclust(distance.row, method = "ward.D")
distance.col = dist(t(as.matrix(vals)), method = "euclidean")
cluster.col = hclust(distance.col, method = "ward.D")
heatmap.2(vals, scale="row",trace="none", dendrogram="both", Rowv=as.dendrogram(cluster.row), Colv=as.dendrogram(cluster.col))
dev.off()

The order now is more similar to pheatmap, but not completely identical...

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.5 years ago by informatics bot ▴ 760

Ram · Accepted Answer · 2016-01-16

11

Entering edit mode

8.3 years ago

Lerong ▴ 130

Basically when you show scaled data, heatmap.2 scale data after clustering , whereas pheatmap scales data before clustering. I am guessing that makes the difference in the final output sometimes.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Lerong ▴ 130

0

Entering edit mode

A related thread about scaling data and these 2 heatmap functions: cannot replicate the pheatmap scale function

ADD REPLY • link 5.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry to comment on this old post, I also just have noticed this difference when trying to create heatmaps. Do you have any suggestion on which is the better way? Clustering then scaling (like heatmap / heatmap.2) or scaling then clustering (like pheatmap), because the cluster results is quite different.

ADD REPLY • link 4.9 years ago by pbigbig ▴ 250