Question: extract dendrogram cluster from pheatmap
1
gravatar for krushnach80
12 months ago by
krushnach80440
krushnach80440 wrote:

This is a post from stackoverflow here they show how to extract dedrogram such in form of respective cluster but this is with heatmap.2 function , i m trying to do the same with pheatmap , let's say a group of genes are forming clusters then i want to extract them

Im doing something like this in my code but its giving me all of them togethter not cluster wise as seen in the heatmap

out = pheatmap(data, 
               color=macolor,

               show_rownames = F,cluster_cols=T,cluster_rows=T, scale="row",
               cex=1,clustering_distance_rows = "euclidean", cex=1,
               clustering_distance_cols = "euclidean", clustering_method = "complete",border_color = FALSE)


res <- data[c(out$tree_row[["order"]]),out$tree_col[["order"]]]

any suggestion or help would be highly appreciated.

R • 4.0k views
ADD COMMENTlink modified 12 months ago by Kevin Blighe33k • written 12 months ago by krushnach80440
12
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:
#Create random data
data <- replicate(20, rnorm(50))
rownames(data) <- paste("Gene", c(1:nrow(data)))
colnames(data) <- paste("Sample", c(1:ncol(data)))

out <- pheatmap(data, 
      show_rownames=T, cluster_cols=T, cluster_rows=T, scale="row",
      cex=1, clustering_distance_rows="euclidean", cex=1,
      clustering_distance_cols="euclidean", clustering_method="complete", border_color=FALSE)

pheatmap

#Re-order original data (genes) to match ordering in heatmap (top-to-bottom)
rownames(data[out$tree_row[["order"]],])
 [1] "Gene 2"  "Gene 12" "Gene 9"  "Gene 33" "Gene 29" "Gene 43" "Gene 24"
 [8] "Gene 44" "Gene 3"  "Gene 21" "Gene 46" "Gene 26" "Gene 20" "Gene 45"
[15] "Gene 18" "Gene 38" "Gene 22" "Gene 1"  "Gene 17" "Gene 7"  "Gene 6" 
[22] "Gene 41" "Gene 30" "Gene 31" "Gene 13" "Gene 16" "Gene 11" "Gene 50"
[29] "Gene 34" "Gene 37" "Gene 15" "Gene 25" "Gene 27" "Gene 39" "Gene 19"
[36] "Gene 35" "Gene 4"  "Gene 49" "Gene 10" "Gene 28" "Gene 8"  "Gene 14"
[43] "Gene 32" "Gene 42" "Gene 36" "Gene 48" "Gene 23" "Gene 47" "Gene 5" 
[50] "Gene 40"


#Re-order original data (samples) to match ordering in heatmap (left-to-right)
colnames(data[,out$tree_col[["order"]]])
 [1] "Sample 3"  "Sample 6"  "Sample 1"  "Sample 17" "Sample 8"  "Sample 10"
 [7] "Sample 15" "Sample 4"  "Sample 18" "Sample 2"  "Sample 11" "Sample 9" 
[13] "Sample 13" "Sample 12" "Sample 7"  "Sample 19" "Sample 14" "Sample 16"
[19] "Sample 5"  "Sample 20"

-------------------------

If you want something like gene-to-cluster assignment, you can 'cut' your row dendrogram into a pre-selected number of groups as follows:

#2 groups
sort(cutree(out$tree_row, k=2))
 Gene 1  Gene 2  Gene 3  Gene 9 Gene 12 Gene 17 Gene 18 Gene 20 Gene 21 Gene 22 
      1       1       1       1       1       1       1       1       1       1 
Gene 24 Gene 26 Gene 29 Gene 33 Gene 38 Gene 43 Gene 44 Gene 45 Gene 46  Gene 4 
      1       1       1       1       1       1       1       1       1       2 
 Gene 5  Gene 6  Gene 7  Gene 8 Gene 10 Gene 11 Gene 13 Gene 14 Gene 15 Gene 16 
      2       2       2       2       2       2       2       2       2       2 
Gene 19 Gene 23 Gene 25 Gene 27 Gene 28 Gene 30 Gene 31 Gene 32 Gene 34 Gene 35 
      2       2       2       2       2       2       2       2       2       2 
Gene 36 Gene 37 Gene 39 Gene 40 Gene 41 Gene 42 Gene 47 Gene 48 Gene 49 Gene 50 
      2       2       2       2       2       2       2       2       2       2 

#5 groups
sort(cutree(out$tree_row, k=5))
 Gene 1  Gene 3 Gene 17 Gene 18 Gene 20 Gene 21 Gene 22 Gene 24 Gene 26 Gene 38 
      1       1       1       1       1       1       1       1       1       1 
Gene 44 Gene 45 Gene 46  Gene 2  Gene 9 Gene 12 Gene 29 Gene 33 Gene 43  Gene 4 
      1       1       1       2       2       2       2       2       2       3 
Gene 19 Gene 35 Gene 49  Gene 5  Gene 8 Gene 10 Gene 14 Gene 23 Gene 28 Gene 32 
      3       3       3       4       4       4       4       4       4       4 
Gene 36 Gene 40 Gene 42 Gene 47 Gene 48  Gene 6  Gene 7 Gene 11 Gene 13 Gene 15 
      4       4       4       4       4       5       5       5       5       5 
...

You can also cut the tree at a pre-defined tree height, and extract the gene-to-cluster assignments at that height:

plot(out$tree_row)
abline(h=7, col="red", lty=2, lwd=2)

dend
upload pictures

#Cut the row (gene) dendrogram at a Euclidean distance dis-similarity of 8
sort(cutree(out$tree_row, h=7))
 Gene 1  Gene 3 Gene 17 Gene 18 Gene 20 Gene 21 Gene 22 Gene 24 Gene 26 Gene 38 
      1       1       1       1       1       1       1       1       1       1 
Gene 44 Gene 45 Gene 46  Gene 2  Gene 9 Gene 12 Gene 29 Gene 33 Gene 43  Gene 4 
      1       1       1       2       2       2       2       2       2       3 
Gene 19 Gene 35 Gene 49  Gene 5  Gene 8 Gene 10 Gene 14 Gene 23 Gene 28 Gene 32 
      3       3       3       4       4       4       4       4       4       4 
Gene 36 Gene 40 Gene 42 Gene 47 Gene 48  Gene 6  Gene 7 Gene 13 Gene 16 Gene 30 
      4       4       4       4       4       5       5       5       5       5 
Gene 31 Gene 41 Gene 11 Gene 15 Gene 25 Gene 27 Gene 34 Gene 37 Gene 39 Gene 50 
      5       5       6       6       6       6       6       6       6       6
ADD COMMENTlink modified 6 months ago • written 12 months ago by Kevin Blighe33k
1

wonderful let me do and i will post it if i have some issues

ADD REPLYlink written 12 months ago by krushnach80440
1

hi kevin its me again so i have used your complex heatmap code and its working wonderful but i wanted to ask you in context of gplot heatmap.2 function this is my code

out <-heatmap.2dat.tn,Rowv=as.dendrogram(c2),Colv = as.dendrogram(c1),col =macolor 
          ,scale="row", margin=c(6, 4),trace='none',labRow = FALSE,
          symkey=FALSE,symbreaks=FALSE,dendrogram="col", 
          density.info='density', denscol="red",lhei=c(1,3),cexRow = .5,cexCol = 1,
          lwid=c(.9,3), keysize=0.1, key.par = list(cex=0.5))

and this is the summary of the out object which is

summary(out)
              Length Class      Mode     
rowInd          389  -none-     numeric  
colInd           34  -none-     numeric  
call             20  -none-     call     
rowMeans        389  -none-     numeric  
rowSDs          389  -none-     numeric  
carpet        13226  -none-     numeric  
rowDendrogram     2  dendrogram list     
colDendrogram     2  dendrogram list     
breaks          101  -none-     numeric  
col             100  -none-     character
colorTable        3  data.frame list     
layout            3  -none-     list

how do i take it out from this object the gene list in the same manner as depicted in heatmap...

the issue want to take out genes which are strikingly different in a one cluster as compared to other.. heatmap

this rownames(dat[out$rowInd,]) the order of the gene which are in rows & this colnames(dat[out$colInd,]) the column list ,how to i back trace back and find the same what i see it in my list, i could do this with your given complex heat map code which gave me link but since i already did most of them with gplot so im doing with this as of now..

ADD REPLYlink modified 9 months ago • written 9 months ago by krushnach80440
1

Hello friend. I am not sure exactly what you need. If you want the row and column names as per the heatmap, then you can do:

colnames(dat)[out$colInd]

rownames(dat)[out$rowInd]
ADD REPLYlink written 9 months ago by Kevin Blighe33k

hi kevin ,yes that i can get it but lets say the first two sample Gran1 and Gran1 expression is different from the rest of the sample if you see so there are genes in that region as "rownames(dat)[out$rowInd]" gives me the total order of genes but is it possible only to find out genes of that particular region

ADD REPLYlink written 9 months ago by krushnach80440
1

Oh, I see. You can do this with cutree(), as follows:

cutree(as.hclust(out$colDendrogram), 2)

That should give your samples as ordered in dat; however, it will assign to each a cluster membership (here, 1 or 2). You can choose the number of clusters by changing the second argument, i.e., for example, if you want 4 clusters then do:

cutree(as.hclust(out$colDendrogram), 4)

--------------------

For better selection, you can do this:

clusDesignation <- cutree(as.hclust(out$colDendrogram), 2)
clusDesignation[clusDesignation==1]
clusDesignation[clusDesignation==2]
ADD REPLYlink written 9 months ago by Kevin Blighe33k

excellent , the object class is integer how can i pass that into a data frame of course i can copy that from my terminal but is there a way to pass it on to some readable form for downstream purpose clusDesignation[clusDesignation==2]

ADD REPLYlink written 9 months ago by krushnach80440
1

I believe that names(clusDesignation[clusDesignation==2]) should give you the gene / sample names for each cluster ?

ADD REPLYlink written 9 months ago by Kevin Blighe33k

thanks a ton i have to cross these mundane yet little things to make my life easier ..as always thanks for helping me out

ADD REPLYlink written 9 months ago by krushnach80440
1

No problem. Apologies for the delay in replying. I was traveling to the other side of the Globe....!

ADD REPLYlink written 9 months ago by Kevin Blighe33k

how do you decide what should be the number of k is it visual inspection after heatmap or how can i decide ? as for example I have like 741 genes so it too dense then how to decide the optimum number of k and what tree height to cut ...?

ADD REPLYlink modified 12 months ago • written 12 months ago by krushnach80440
2

how do you decide what should be the number of k is it visual inspection after heatmap or how can i decide ?

lol - you possibly do not realise the entire area of clustering research that you've opened up with just this single question. There is no correct or incorrect answer.

You can sometimes justify just a visual inspection of the dendrogram and then decide a point to cut using cutree with k or h, preferably h because then you have a cut-off threshold based on the distance metric that you used (usually Euclidean distance or Pearson correlation distance.

Note that I recently published a study on this topic but on metabolomics data: Vitamin D prenatal programming of childhood metabolomics profiles at age 3 y

Various methods exist, such as:

  • Gap statistic
  • Elbow method
  • Silhouette method
  • Consensus clustering

Also note that the ComplexHeatmap can incorporate PAM or k-means clustering in order to split your heatmap, or you can define the split yourself, like this: C: how to cluster genes in heatmap

To see how to do this, see also here: Split heatmap by rows

It may be worth opening up new question, as some of my colleagues also work on clustering.

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe33k

wow looking at your answer it just opened up what are the things are can be done with data ..I am looking into the post first i will apply into my data ..

ADD REPLYlink written 12 months ago by krushnach80440
1

Yes, with data, the possibilities are endless, but always proceed with caution about results that are seemingly too good to be true (usually they are not true).

ADD REPLYlink written 12 months ago by Kevin Blighe33k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1650 users visited in the last hour