Question: Pheatmap cluster number labels
gravatar for thomasterTW
24 months ago by
RIMLS, Nijmegen, The Netherlands
thomasterTW20 wrote:

My question is very simple. When I use the clustering option in pheatmap, for example:

res<-pheatmap(matrix, scale="row", cluster_cols=FALSE, show_rownames=FALSE, cutree_rows=8, color=colorRampPalette(c("red","black","green"))(50), cellwidth=30, cellheight=0.4, main="C2C7_Differential_Gene_Expression", fontsize=4,filename="C2C7_Differential_Gene_Expression8clusters.pdf")

I get a cool heatmap image with 8 differentiated clustes. These clusters however are not labeled. After writing the data to a file:

b<-data.frame(cutree(res$tree_row, k=8))
write.table(, file="C2C7_Differential_Gene_Expression8clusters.txt", sep="\t")

I assumed cluster 1 in this file should be the top cluster in the image. This turns out to not be the case. I think the cluster numbers in the file are ordered based on the tree branching.

As I do not want to waste time figuring out which cluster number corresponds to which cluster in the image, I wonder: Is there an option in pheatmap to label the clusters in the image?

clustering rna-seq pheatmap • 3.8k views
ADD COMMENTlink modified 6 months ago by sisterdot20 • written 24 months ago by thomasterTW20
gravatar for sisterdot
6 months ago by
sisterdot20 wrote:

you approach was very close to what i think you need to do (at least using pheatmap_1.0.8). the last step will give you the cluster assignment in the right order.

mat = matrix(rnorm(200), 20, 10)
res<-pheatmap(mat, cutree_rows=k.fix,show_rownames = T)
cutree(res$tree_row, k=k.fix)[res$tree_row[["order"]]]

pheatmap seems to use the setting of cutree_rows only for determining gaps

if (! {
     gaps_row = find_gaps(tree_row, cutree_rows)

and the find_gaps function does

    find_gaps = function(tree, cutree_n){
        v = cutree(tree, cutree_n)[tree$order]
        gaps = which((v[-1] - v[-length(v)]) != 0)

it does not appear as if the cluster assignment can be extracted from what the pheatmap function returns.

invisible(list(tree_row = tree_row, tree_col = tree_col,  kmeans = km, gtable = gt))
ADD COMMENTlink written 6 months ago by sisterdot20

and for the complete ordered matrix with cluster info:

mat.order <- cbind(mat[c(res$tree_row[["order"]]),res$tree_col[["order"]]],cluster=cutree(res$tree_row, k=k.fix)[res$tree_row[["order"]]])
ADD REPLYlink written 6 months ago by sisterdot20
gravatar for EagleEye
24 months ago by
EagleEye5.3k wrote:

To get the data in the exact order as in heatmap use this,

A: To extract row names of plotted heatmap

ADD COMMENTlink written 24 months ago by EagleEye5.3k

Thanks! That's what I need. However, when I try this method I get an error.

Error in `[.data.frame`(matrix, c(res$tree_row[["order"]])) :
  undefined columns selected
ADD REPLYlink modified 24 months ago • written 24 months ago by thomasterTW20

I already fixed the problem. Should be res2<-matrix[c(res$tree_row[["order"]]),]

ADD REPLYlink written 24 months ago by thomasterTW20

I still can't get it to work :(

This it the output of pheatmap:

> res

hclust(d = d, method = method)

Cluster method   : complete
Distance         : euclidean
Number of objects: 6445

[1] NA

[1] NA

TableGrob (5 x 6) "layout": 5 grobs
  z     cells      name                        grob
1 1 (1-1,3-3)      main         text[GRID.text.179]
2 2 (4-4,1-1)  row_tree polyline[GRID.polyline.180]
3 3 (4-4,3-3)    matrix       gTree[GRID.gTree.182]
4 4 (5-5,3-3) col_names         text[GRID.text.183]
5 5 (4-5,5-5)    legend       gTree[GRID.gTree.186]

When applying my original method ( b <- data.frame(cutree(res$tree_row, k=5))) the output looks like this

> head(b)
ENSMUSG00000000001                           1
ENSMUSG00000000028                           2
ENSMUSG00000000031                           3
ENSMUSG00000000056                           3
ENSMUSG00000000058                           1
ENSMUSG00000000078                           1

With the suggested method (res2<-matrix[c(res$tree_row[["order"]]),]) the output looks like this

> head(res2)
                      FPKM_1b    FPKM_2b   FPKM_3b   FPKM_4b   FPKM_5b   FPKM_6b
ENSMUSG00000043257  0.3025560  0.4250808 0.4468834 0.5655751 0.5719265 0.4285203
ENSMUSG00000020283  0.9988489  1.0104823 1.1025160 1.2239578 1.1445088 1.0986714
ENSMUSG00000028469 -0.1916256 -0.1084386 0.1324678 0.3324304 0.2095445 0.0737440
ENSMUSG00000021939  1.5766039  1.5866526 1.6339802 1.6939370 1.6958720 1.6544634
ENSMUSG00000068040  1.2922228  1.2956221 1.3249447 1.3735446 1.3566950 1.3483341
ENSMUSG00000062611  2.6689941  2.7180206 2.6920966 2.7256585 2.7129930 2.7073197

What do I need to do to get the output in the way of my original method? I only need the gene ID and in which cluster it is located.

ADD REPLYlink written 24 months ago by thomasterTW20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 775 users visited in the last hour