Question: Pheatmap cluster number labels
gravatar for thomasterTW
4.3 years ago by
RIMLS, Nijmegen, The Netherlands
thomasterTW20 wrote:

My question is very simple. When I use the clustering option in pheatmap, for example:

res<-pheatmap(matrix, scale="row", cluster_cols=FALSE, show_rownames=FALSE, cutree_rows=8, color=colorRampPalette(c("red","black","green"))(50), cellwidth=30, cellheight=0.4, main="C2C7_Differential_Gene_Expression", fontsize=4,filename="C2C7_Differential_Gene_Expression8clusters.pdf")

I get a cool heatmap image with 8 differentiated clustes. These clusters however are not labeled. After writing the data to a file:

b<-data.frame(cutree(res$tree_row, k=8))
write.table(, file="C2C7_Differential_Gene_Expression8clusters.txt", sep="\t")

I assumed cluster 1 in this file should be the top cluster in the image. This turns out to not be the case. I think the cluster numbers in the file are ordered based on the tree branching.

As I do not want to waste time figuring out which cluster number corresponds to which cluster in the image, I wonder: Is there an option in pheatmap to label the clusters in the image?

clustering rna-seq pheatmap • 8.4k views
ADD COMMENTlink modified 2.9 years ago by sisterdot40 • written 4.3 years ago by thomasterTW20
gravatar for sisterdot
2.9 years ago by
sisterdot40 wrote:

you approach was very close to what i think you need to do (at least using pheatmap_1.0.8). the last step will give you the cluster assignment in the right order.

mat = matrix(rnorm(200), 20, 10)
res<-pheatmap(mat, cutree_rows=k.fix,show_rownames = T)
cutree(res$tree_row, k=k.fix)[res$tree_row[["order"]]]

pheatmap seems to use the setting of cutree_rows only for determining gaps

if (! {
     gaps_row = find_gaps(tree_row, cutree_rows)

and the find_gaps function does

    find_gaps = function(tree, cutree_n){
        v = cutree(tree, cutree_n)[tree$order]
        gaps = which((v[-1] - v[-length(v)]) != 0)

it does not appear as if the cluster assignment can be extracted from what the pheatmap function returns.

invisible(list(tree_row = tree_row, tree_col = tree_col,  kmeans = km, gtable = gt))
ADD COMMENTlink written 2.9 years ago by sisterdot40

and for the complete ordered matrix with cluster info:

mat.order <- cbind(mat[c(res$tree_row[["order"]]),res$tree_col[["order"]]],cluster=cutree(res$tree_row, k=k.fix)[res$tree_row[["order"]]])
ADD REPLYlink written 2.9 years ago by sisterdot40
gravatar for EagleEye
4.3 years ago by
EagleEye6.7k wrote:

To get the data in the exact order as in heatmap use this,

A: To extract row names of plotted heatmap

ADD COMMENTlink written 4.3 years ago by EagleEye6.7k

Thanks! That's what I need. However, when I try this method I get an error.

Error in `[.data.frame`(matrix, c(res$tree_row[["order"]])) :
  undefined columns selected
ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by thomasterTW20

I already fixed the problem. Should be res2<-matrix[c(res$tree_row[["order"]]),]

ADD REPLYlink written 4.3 years ago by thomasterTW20

I still can't get it to work :(

This it the output of pheatmap:

> res

hclust(d = d, method = method)

Cluster method   : complete
Distance         : euclidean
Number of objects: 6445

[1] NA

[1] NA

TableGrob (5 x 6) "layout": 5 grobs
  z     cells      name                        grob
1 1 (1-1,3-3)      main         text[GRID.text.179]
2 2 (4-4,1-1)  row_tree polyline[GRID.polyline.180]
3 3 (4-4,3-3)    matrix       gTree[GRID.gTree.182]
4 4 (5-5,3-3) col_names         text[GRID.text.183]
5 5 (4-5,5-5)    legend       gTree[GRID.gTree.186]

When applying my original method ( b <- data.frame(cutree(res$tree_row, k=5))) the output looks like this

> head(b)
ENSMUSG00000000001                           1
ENSMUSG00000000028                           2
ENSMUSG00000000031                           3
ENSMUSG00000000056                           3
ENSMUSG00000000058                           1
ENSMUSG00000000078                           1

With the suggested method (res2<-matrix[c(res$tree_row[["order"]]),]) the output looks like this

> head(res2)
                      FPKM_1b    FPKM_2b   FPKM_3b   FPKM_4b   FPKM_5b   FPKM_6b
ENSMUSG00000043257  0.3025560  0.4250808 0.4468834 0.5655751 0.5719265 0.4285203
ENSMUSG00000020283  0.9988489  1.0104823 1.1025160 1.2239578 1.1445088 1.0986714
ENSMUSG00000028469 -0.1916256 -0.1084386 0.1324678 0.3324304 0.2095445 0.0737440
ENSMUSG00000021939  1.5766039  1.5866526 1.6339802 1.6939370 1.6958720 1.6544634
ENSMUSG00000068040  1.2922228  1.2956221 1.3249447 1.3735446 1.3566950 1.3483341
ENSMUSG00000062611  2.6689941  2.7180206 2.6920966 2.7256585 2.7129930 2.7073197

What do I need to do to get the output in the way of my original method? I only need the gene ID and in which cluster it is located.

ADD REPLYlink written 4.3 years ago by thomasterTW20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 951 users visited in the last hour