Question: How to get the order of clustered genes of heatmap.2 to a .csv file?
0
gravatar for Wox
12 months ago by
Wox340
HUJI
Wox340 wrote:

I have a data frame of omics data. Gene ids in rows (931), and samples in columns (15).

> dim(my_data) # (rows columns)
[1] 931  16

I created heatmap using library(gplots)

cn=colnames(gdf1)[c(13:15,1:12)]

col <- colorRampPalette(c("red","yellow","darkgreen"))(30)

heatmap.2(as.matrix(gdf1[,cn]), 
          dendrogram = "row", 
          Colv = FALSE, 
          Rowv = TRUE,
          scale = "none", 
          col = col,
          key = TRUE, 
          density.info = "none", 
          key.title = NA, 
          key.xlab = "Abundance",
          trace = "none",
          margins = c(7, 15))

Rplot03

However, in the heatmap, I see only few genes. Since it has 900 ish genes. How can I export what are the clustered genes in each cluster in the same order as in the heatmap?

Also, how can I reduce the size of colorkey?

Thank you.

rna-seq R • 3.2k views
ADD COMMENTlink modified 12 months ago by Kevin Blighe65k • written 12 months ago by Wox340
4
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

Edit 13th September, 2019: To additionally see how to extract clusters of genes from the heatmap dendrogram, zoom down to this later comment: C: How to get the order of clustered genes of a heat map to a .csv file?

---------------

Hello,

First, create random data

mat <- matrix(rexp(200, rate=.1), ncol=20)
rownames(mat) <- paste0('gene',1:nrow(mat))
colnames(mat) <- paste0('sample',1:ncol(mat))
mat[1:5,1:5]
         sample1   sample2   sample3   sample4     sample5
gene1  0.6247039  3.020142  8.303563  6.482744  0.59547154
gene2  2.6650871  3.375123  5.778222 19.410709  0.07966728
gene3  4.6343755  5.491166  8.716883  9.490372 29.03157875
gene4 13.6086878  3.632815 10.688699  1.263853  2.54216953
gene5  2.4060078 14.283380  8.592085  3.998141  0.25853135

Generate a heatmap and save it to out

out <- heatmap.2(mat)

fff

Obtain list of genes, ordered as per heatmap (from bottom, up):

rownames(mat)[out$rowInd]
 [1] "gene2"  "gene9"  "gene7"  "gene8"  "gene4"  "gene5"  "gene3"  "gene6" 
 [9] "gene1"  "gene10"

Plot the row dendrogram on its own:

plot(out$rowDendrogram)

hhhhh

Change colour key size

Use keysize parameter

----------------------------------

See also here for pheatmap: A: extract dendrogram cluster from pheatmap

Kevin

ADD COMMENTlink modified 12 months ago • written 12 months ago by Kevin Blighe65k

Thanks a heap, Kevin :) This is helpful. BTW, I did try this now. Since I have ~900 genes. it's too much for plotting in a diagram. Gow can I export these clustered genes to a csv file, in the same order as showing in the dendrogram?.

ADD REPLYlink written 12 months ago by Wox340
1

Hey, you just mean like this ? -

write.table(
  data.frame(gene = rownames(mat)[out$rowInd]),
  'out.csv',
  row.names = FALSE,
  quote = FALSE,
  sep = ',')

ccccc

Remember that the order is bottom-to-top

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe65k

Thanks Kevin, This also useful. I am looking into much more detailed out put. Not sure if this is possible in heatmap2.

For selected cutoff (e.g. distance = 40), how can we separate the list of genes n that cluster?

something like this, How to see the grouping of genes for each cluster in the o/p?

Capture

ADD REPLYlink written 12 months ago by Wox340
1

Ah - I see. In that case, it is easier to create your own dendrogram outside heatmap.2(), and then use cutree() on that:

Create random data

mat <- matrix(rexp(200, rate=.1), ncol=20)
rownames(mat) <- paste0('gene',1:nrow(mat))
colnames(mat) <- paste0('sample',1:ncol(mat))

Cluster the genes (rows) manually

row_clust <- hclust(dist(mat, method = 'euclidean'), method = 'ward.D2')

Plot the heatmap

require(gplots)
out <- heatmap.2(
  mat,
  Rowv = as.dendrogram(row_clust))

gfgf

plot(row_clust)

dddd

They are the same.

Cut the dendrogram into groups or specify a height for the cut-off:

#2 groups
sort(cutree(row_clust, k=2))
 gene1  gene2  gene3  gene4  gene6  gene7  gene8  gene9 gene10  gene5 
     1      1      1      1      1      1      1      1      1      2 

#5 groups
sort(cutree(row_clust, k=5))
 gene1  gene9  gene2  gene3  gene4  gene8 gene10  gene5  gene6  gene7 
     1      1      2      3      3      3      3      4      5      5 


# specify a height of 70
sort(cutree(row_clust, h = 70))
 gene1  gene2  gene3  gene4  gene5  gene6  gene7  gene8  gene9 gene10 
     1      1      2      2      3      4      4      2      1      2 

plot(row_clust)
abline(h = 70, col = "red2", lty = 2, lwd = 2)

dadsad

----------------------

You should be able to output these lists as per the order in the heatmap / dendrogram to (the indices are stored in out$rowInd)

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe65k
1

Thanks a heap, Kevin :) Appreciate.

ADD REPLYlink written 12 months ago by Wox340

Hi Kevin:

Is it possible to do the same in ComplexHeat map. I put the above command in Heatmap it throws out a message:"

Error in Heatmap(mydat_2, name = "mat", Rowv = as.dendrogram(row_clust), : unused argument (Rowv = as.dendrogram(row_clust))

I read the manual of complexheatmap(split by dendrogram part), but does not understand fully how it can be done in practice.

ADD REPLYlink written 4 months ago by Kai_Qi100
1

For ComplexHeatmap, it's a bit different. I think that you want to do this:

?

Or is this what you want to do:

?

ADD REPLYlink written 4 months ago by Kevin Blighe65k

Hi Kevlin:

Thank you for the link, when I tried to re-do what the code instructed, but I got an error I can't understand. My matrix contains 5 columns, column 1 to column4 are the counts at different stages, while column5 are the differences between colum3 and column1 (which is diff=column3-colum1). Rowname of the matrix is the gene.

I use row_clust <- hclust(dist(mydat_2, method = 'euclidean'), method = 'ward.D2') to do the clustering manually. And I use

HM <- Heatmap(mydat_2, name = "mat", cluster_rows = row_clust, col = col_fun, column_order = colnames(mydat_2), column_title = "Developmental Stages",
               column_title_side = "bottom", row_title="Retained_introns", show_row_names = FALSE)
HM <- draw(HM)

To draw the Heatmap. The Heatmap looks beautiful. I have 2 question on how to make it better: 1. the number of column5 is a little bit different from column1-4 because it is (column3-column1). It contains negative value. So the color can not exactly match what I set for column 1-4. So, my question is in ComplexHeatmap, is it possible to set the column5 side by side(same gene cluster) to column 1-4 with a different color pannel?

  1. When I run the code from the github, I got an error, which I can not understand why:

    for (i in 1:length(row_order(HM))){

    • if (i == 1) {
    • clu <- t(t(row.names(mydat_2[row_order(HM)[[i]],])))
    • out1 <- cbind(clu, paste("cluster", i, sep=""))
    • colnames(out1) <- c("coordinates", "Cluster")
    • } else {
    • clu <- t(t(row.names(mydat_2[row_order(HM)[[i]],])))
    • clu <- cbind(clu, paste("cluster", i, sep=""))
    • out1 <- rbind(out1, clu)
    • }
    • } Error in t.default(row.names(mydat_2[row_order(HM)[[i]], ])) : argument is not a matrix

since mydat_2 is a matrix, why extract the rows is not a matrix?

I hope I expressed myself clear and thanks for any kind of advice,

ADD REPLYlink modified 4 months ago • written 4 months ago by Kai_Qi100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 730 users visited in the last hour