Saving individual clusters from heatmaps
4
0
Entering edit mode
6.8 years ago
1769mkc ★ 1.2k

I am doing a gene based clustering ,while doing so i set of genes that are cluster together ,how can i take out those cluster set from the heatmap for further analysis . Can i do it while i create heatmap ,in other words can i define a function in the heatmap code to take out cluster or I have to do that manually through visual inspection ?

Any suggestion or help would be highly appreciated.

R • 10k views
ADD COMMENT
2
Entering edit mode

I edited the title of this post so it is easy to find it in future by others.

ADD REPLY
0
Entering edit mode

What function are you using ? There are many ways of doing heatmaps in R. Depending on the function, the easiest way may be to use the same clustering function with the same parameters as used in your heatmap function, e.g. heatmap() uses hclust() by default. Note that for hierachical clustering, you would need to cut the tree to get clusters.

ADD REPLY
12
Entering edit mode
6.8 years ago
EagleEye 7.5k

1) Below is an example for heatmap with three clusters and saving the entities from each clusters as a list (plain text file). Check ComplexHeatmap and for more options documentation for clustering.

library("ComplexHeatmap") ## For heatmap
library("circlize") ## For color options

## Creating heatmap with three clusters (See the ComplexHeatmap documentation for more options)
ht = Heatmap(mymatrix, km=3, col = colorRamp2(c(min(mymatrix), 0, max(mymatrix)), c("green", "white", "red")))
ht = draw(ht)

# Saving row names of cluster one
c1 <- t(t(row.names(mymatrix[row_order(ht)[[1]],])))
write.table(c1,"c1_ids.list", sep="\n", quote=F, row.names=F,col.names=F)

# Saving row names of cluster two
c2 <- t(t(row.names(mymatrix[row_order(ht)[[2]],])))
write.table(c2,"c2_ids.list", sep="\n", quote=F, row.names=F,col.names=F)

# Saving row names of cluster three
c3 <- t(t(row.names(mymatrix[row_order(ht)[[3]],])))
write.table(c3,"c3_ids.list", sep="\n", quote=F, row.names=F,col.names=F)

2) If you are using pheatmap, you can extract the same order from heatmap. Check this post.

3) If you are using single cell data, considering SC3 is the best option (article).

Suggestion/Request: It will be good and easy for other users to find the answer if you change the topic similar to 'Saving individual clusters from heatmaps'.

ADD COMMENT
0
Entering edit mode

I tried your code it works perfectly fine with complex heatmap But when I doing the same with pheatmap basically this is what im doing

out <- pheatmap(data, 
            color=myColor,
            breaks = myBreaks,
            show_rownames = T,cluster_cols=T,cluster_rows=T,
            cex=.5,clustering_distance_rows = "euclidean",cex=.5, 
            clustering_distance_cols = "euclidean", clustering_method = "complete",border_color = FALSE)


res <- data[c(out$tree_row[["order"]]),out$tree_col[["order"]]]

when i View(res) Im not getting individual cluster which i did get for complexheatmap rather its complete list..

ADD REPLY
0
Entering edit mode

With pheatmap you will not get individual clusters rather you get entities in the same order from the heatmap

ADD REPLY
0
Entering edit mode

would it help if I use kmeans_k = this argument ?for pheatmap

ADD REPLY
0
Entering edit mode

@EagleEye the code with complex heatmap works fine , i m getting the clusters of gene that is getting clustered but can i get the cluster of genes along with their values from the heatmap , because in seqmonk when it plots a heatmap and makes cluster ,when i take out these clusters it gives the list of gene in a respective cluster as well as the values. Of course i can take out genes names from the clsuter and again map it but , if there is a way to define it and take the genes as well as their values from the cluster then it would be really good.

ADD REPLY
4
Entering edit mode
5.1 years ago
Ron ★ 1.2k

For extracting the clusters based on Columns.(ComplexHeatmaps)

mat = matrix(rnorm(80, 2), 8, 10) 
mat = rbind(mat, matrix(rnorm(40,-2), 4, 10))  
rownames(mat) = letters[1:12]  
colnames(mat) = letters[1:10]  
HM <- Heatmap(mat, km=3 , column_km = 3)  HM


for (i in 1:length(column_order(HM))){   if (i == 1) {
    clu <- t(t(colnames(mat[,column_order(HM)[[i]]])))
    out <- cbind(clu, paste("cluster", i, sep=""))
    colnames(out) <- c("GeneID", "Cluster")   } else {
    clu <- t(t(colnames(mat[,column_order(HM)[[i]]])))
    clu <- cbind(clu, paste("cluster", i, sep=""))
    out <- rbind(out, clu)   } 
}

The above example is similar to extracting the row based clustering given here.

https://github.com/jokergoo/ComplexHeatmap/issues/136

ADD COMMENT
3
Entering edit mode
6.8 years ago

If you're doing supervised k-means clustering, you could do something like the following:

output_dir_prefix <- "results"
kclusters <- c(2, 3, 4, 6, 8, 10, 15, 20, 25, 50, 100)
for (kcluster in kclusters) {
     print(kcluster)
     dir.create(file.path(output_dir_prefix, kcluster))
     clustering <- kmeans(data, kcluster)
     for (i in seq(1, kcluster, 1)) {
         print(paste(kcluster, i, sep=":"))
         out_fn <- paste(output_dir_prefix, kcluster, paste(i, ".mtx", sep=""), sep="/")
         body <- data[clustering$cluster == i,]
         write.table(body, file=out_fn, quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE, append=TRUE)
     }
}

This gives submatrices of results from running k-means clustering on your data over k clusters in kclusters.

You could use matrix2png on each clustering submatrix i.mtx file, in order to generate a heatmap visualization for that submatrix.

If you're doing unsupervised, hierarchical clustering, you could use cutree to cut a tree into segments at a specified tree height. You might visually inspect the clustering to decide on the height, or use some sensibly arbitrary heuristic. Examples of this are described in answers to another Biostars question.

ADD COMMENT
0
Entering edit mode

How are you defining these kclusters <- c(2, 3, 4, 6, 8, 10, 15, 20, 25, 50, 100)?

ADD REPLY
0
Entering edit mode

I m bit confused with your code may be it bit advanced for me. Would you explain it .What i understand so far is you are defining possible clusters , after that im not getting it . if you can explain it would be really helpful and i would be glad

ADD REPLY
1
Entering edit mode

The for (kcluster in kclusters) line loops through each of the values in kclusters.

Inside this loop, I apply k-means clustering on the rows of data, for some value k, which is just the variable kcluster: 2, 3, and so on up to 100 clusters. I run kmeans(data, kcluster) and store the clustering result in a variable called clustering.

The variable clustering stores an assignment of each row in data to one of k clusters. So when kcluster is 3, for example, there are three clusters in clustering that I can access: clustering$cluster == 1, clustering$cluster == 2 and clustering$cluster == 3.

The line for (i in seq(1, kcluster, 1)) simply loops over each value from 1 to kcluster, stores that loop counter in a variable called i. The loop writes out the rows in clustering where clustering$cluster == i with write.table.

ADD REPLY
0
Entering edit mode

Thank your very much for a very clear explanation.

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6