How to cluster genes in heatmap
2
2
Entering edit mode
4.0 years ago
Mehmet ▴ 720

Dear All,

I have a data matrix that has 17 samples and over 800 genes that belong to ten different gene families. I want to show these gene families in heatmap by marking them.

I performed heatmap but I do not know how to show gene families in the heatmap graph.

Anyone knows how to that?

RNA-Seq R gene • 11k views
2
Entering edit mode

try annotation_rows in pheatmap.

1
Entering edit mode

a very good example of a clustered heatmap

6
Entering edit mode
4.0 years ago

Yes, as per Sean, in ComplexHeatmap, you can segregate your heatmap into 'blocks' of different genes using the 'split' parameter. You can end up with nice heatmaps like this: A: How to plot a heatmap with two different distance matrices for X and Y

Edit April 16, 2019: skip to the working example, here: C: how to cluster genes in heatmap

0
Entering edit mode

Hi Kevin,

Here is my data:

EffectorName    GeneID  Sample1      Sample2     Sample3    Sample 4 .... Sample 17
GH45            Gene1   25.7847      19.710      22.6148    ....          .......
Expansin        Gene2   29.2436      29.2168     963.745    ......        .......

.......................................................................................................................................................


What I want to do is to show EffectorNames in the heatmap.

0
Entering edit mode

Sorry, it's not clear what you want to do...

If I have this data:

Family      Gene  Sam1 Sam2 Sam3 Sam4
ncRNA       Gene1 10   11   6    1
ncRNA       Gene2 7    6    7    33
pseudogene  Gene3 6    65   3    3
ncRNA       Gene4 10   11   6    1
ncRNA       Gene5 7    6    7    33
pseudogene  Gene6 6    65   3    3


For ComplexHeatmap, if I want to break the heatmap by gene family, I would supply the 'Family' column to the split parameter of the Heatmap function in ComplexHeatmap. This would then break up the heatmap and perform clustering independently for genes under ncRNA and pseudogenes.

3
Entering edit mode

R code:

test=read.csv("file.txt", sep="\t", header=T)
rownames(test)=test[,2]
chars=test[,c(1,2)]
test1=test[,c(3:6)]
pheatmap(as.matrix(test1), scale = "row", clustering_distance_rows = "correlation", clustering_method = "complete",color =rainbow(2), main="Significant genes", fontsize_col=24, fontsize_row = 24,annotation_row = chars[1])


input:

$cat file.txt Family Gene Sam1 Sam2 Sam3 Sam4 ncRNA Gene1 10 11 6 1 ncRNA Gene2 7 6 7 33 pseudogene Gene3 6 65 3 3 ncRNA Gene4 10 11 6 1 ncRNA Gene5 7 6 7 33 pseudogene Gene6 6 65 3 3  ADD REPLY 0 Entering edit mode Hi , I am trying to run, but I got errors: Error in seq.default(-m, m, length.out = n + 1) : 'from' must be a finite number In addition: Warning messages: 1: In min(x, na.rm = T) : no non-missing arguments to min; returning Inf 2: In max(x, na.rm = T) : no non-missing arguments to max; returning -Inf  ADD REPLY 3 Entering edit mode Hi! For ComplexHeatmap, try this code (note the split parameter): require(ComplexHeatmap) require(circlize) require(cluster) df <- read.table("test", header=TRUE) df Family Gene Sam1 Sam2 Sam3 Sam4 1 ncRNA Gene1 10 11 6 1 2 ncRNA Gene2 7 6 7 33 3 pseudogene Gene3 6 65 3 3 4 ncRNA Gene4 10 11 6 1 5 ncRNA Gene5 7 6 7 33 6 pseudogene Gene6 6 65 3 3 7 pseudogene Gene7 5 45 2 1 heat <- t(scale(t(df[,3:ncol(df)]))) hmap <- Heatmap(heat, name="Transcript Z-score", #col=colorRamp2(myBreaks, myCol), heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")), split=df$Family,
row_title="Transcript class",
row_title_side="left",
row_title_gp=gpar(fontsize=15, fontface="bold"),
show_row_names=TRUE,
column_title="",
column_title_side="top",
column_title_gp=gpar(fontsize=15, fontface="bold"),
column_title_rot=0,
show_column_names=TRUE,
clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
clustering_method_columns="ward.D2",
clustering_distance_rows="euclidean",
clustering_method_rows="ward.D2",
row_dend_width=unit(30,"mm"),
column_dend_height=unit(30,"mm"))

draw(hmap, heatmap_legend_side="left")


0
Entering edit mode

Hi Kevin,

Thank you. I followed but I received this error;

Error in colorRamp2(myBreaks, myCol) :
Length of breaks should be equal to colors.

1
Entering edit mode

Yes, I commented out that part of the code. If you would like to use it, then execute the following prior to generating the heatmap:

#myCol <- colorRampPalette(c("violet", "black", "springgreen"))(100)
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-3, 3, length.out=100)


You can choose any colours that you want here.

Also note that the t( scale( t( x ) ) ) function is scaling the data to Z-scores.

0
Entering edit mode

Hi Kevin,

Thank you very much for your help. I was able to generate a heatmap as I wanted.

2
Entering edit mode

Great. You should devote a full working day to looking over ComplexHeatmap. Once you learn it, you will never then go back to heamap.2 or pheatmap.

1
Entering edit mode

Yes I will. Your help to this post is a tutorial for other people, so anyone can follow these steps easily to make a complex heatmap.

0
Entering edit mode

One thing I want to ask; how to change position of Family names in the heatmap? They are positioned as vertical, but I want to show them as vertical. because it is not possible to see as some of them are overlapped.

4
Entering edit mode

Take a look at this (below). I now add all sorts of annotations for you, just to give you an idea. Also note the following:

• I set the orientation/rotation of the family names with row_title_rot=0 (note that when you use the split parameter, it overrides the row title)
• I now set gene names as rownames for heat, with rownames(heat) <- df$Gene • I use different distance metrics for rows and columns, with clustering_distance_columns and clustering_distance_rows This is a 'simple' ComplexHeatmap though, if that makes sense. There is much more to ComplexHeatmap, and you don't even want to see the complexity of one of the recent ones that I made. The code for it runs into the hundreds of lines. I commend the author of the package, who did a really great job. require(ComplexHeatmap) require(circlize) require(cluster) df <- read.table("test", header=TRUE) df Family Gene Sam1 Sam2 Sam3 Sam4 1 ncRNA Gene1 10 11 6 1 2 ncRNA Gene2 7 6 7 33 3 pseudogene Gene3 6 65 3 3 4 ncRNA Gene4 10 11 6 1 5 ncRNA Gene5 7 6 7 33 6 pseudogene Gene6 6 65 3 3 7 pseudogene Gene7 5 45 2 1 heat <- t(scale(t(df[,3:ncol(df)]))) rownames(heat) <- df$Gene

#Set annotation
ColAnn <- data.frame(colnames(heat))
colnames(ColAnn) <- c("Sample")
ColAnn <- HeatmapAnnotation(df=ColAnn, which="col")

RowAnn <- data.frame(df$Family) colnames(RowAnn) <- c("Gene family") colours <- list("Gene family"=c("ncRNA"="royalblue","pseudogene"="red3")) RowAnn <- HeatmapAnnotation(df=RowAnn, col=colours, which="row") boxAnnCol <- HeatmapAnnotation(boxplot=anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), lim=NULL, pch=".", size=unit(2, "mm"), axis=FALSE, axis_side=NULL, axis_gp=gpar(fontsize=12)), annotation_width=unit(c(1, 7.5), "cm")) boxAnnRow <- rowAnnotation(boxplot=row_anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), lim=NULL, pch=".", size=unit(3, "cm"), axis=FALSE, axis_side="top", axis_gp=gpar(fontsize=12)), annotation_width=unit(c(3), "cm")) hmap <- Heatmap(heat, name="Transcript Z-score", col=colorRamp2(myBreaks, myCol), heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")), #Split heatmap rows by gene family split=df$Family,

#Row annotation configurations
cluster_rows=TRUE,
show_row_dend=TRUE,
#row_title="Transcript", #overridden by 'split' it seems
row_title_side="left",
row_title_gp=gpar(fontsize=15, fontface="bold"),
show_row_names=TRUE,
row_names_side="left",
row_title_rot=0,

#Column annotation configuratiions
cluster_columns=TRUE,
show_column_dend=TRUE,
column_title="Samples",
column_title_side="top",
column_title_gp=gpar(fontsize=15, fontface="bold"),
column_title_rot=0,
show_column_names=TRUE,

#Dendrogram configurations: columns
clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
clustering_method_columns="ward.D2",
column_dend_height=unit(30,"mm"),

#Dendrogram configurations: rows
clustering_distance_rows="euclidean",
clustering_method_rows="ward.D2",
row_dend_width=unit(30,"mm"),

#Annotations (row annotation must be added with 'draw' function, below)
top_annotation_height=unit(0.5,"cm"),
top_annotation=ColAnn,

bottom_annotation_height=unit(3, "cm"),
bottom_annotation=boxAnnCol)

draw(hmap + RowAnn + boxAnnRow, heatmap_legend_side="left", annotation_legend_side="right")


1
Entering edit mode

Hi kevin Im using your code..have a look I m getting some error

require(ComplexHeatmap)
require(circlize)
require(cluster)
df
dim(df)
names(df)
heat <- t(scale(t(df[,3:ncol(df)])))

#################################################

##############################################

rownames(heat) <- df$Gene myCol <- colorRampPalette(c("navyblue", "white", "red"))(100) myBreaks <- seq(-2,2, length.out=100) #Set annotation ColAnn <- data.frame(colnames(heat)) colnames(ColAnn) <- c("Sample") ColAnn <- HeatmapAnnotation(df=ColAnn, which="col") RowAnn <- data.frame(df$Family)
colnames(RowAnn) <- c("Gene family")
colours <- list("Gene family"=
c("Interferon Signaling"="red","Communication between Innate and Adaptive Immune Cells"="red1","Atherosclerosis Signaling "="red2",
"Activation of IRF by Cytosolic Pattern Recognition Receptors
"="azure","Neuroinflammation Signaling Pathway
"="royalblue","Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses
"="royalblue1","Role of Macrophages"="royalblue2","Death Receptor Signaling"="royalblue3","TREM1 Signaling
"="royalblue4","Toll-like Receptor Signaling
"="cyan1","NF-κB Signaling"="cyan2","HMGB1 Signaling
"="cyan3","PKCθ Signaling in T Lymphocytes"="cyan4","PPARα/RXRα Activation"="green4" ))
RowAnn <- HeatmapAnnotation(df=RowAnn, col=colours, which="row")

boxAnnCol <- HeatmapAnnotation(boxplot=anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), pch=".", size=unit(2, "mm"), axis=FALSE, axis_side=NULL, axis_gp=gpar(fontsize=12)), annotation_width=unit(c(1, 7.5), "cm"))

boxAnnRow <- rowAnnotation(boxplot=row_anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"),pch=".", size=unit(3, "cm"), axis=FALSE, axis_side="top", axis_gp=gpar(fontsize=12)), annotation_width=unit(c(3), "cm"))

hmap <- Heatmap(heat,
name="Transcript Z-score",
col=colorRamp2(myBreaks, myCol),
heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")),

#Split heatmap rows by gene family
split=df\$Family,

#Row annotation configurations
cluster_rows=FALSE,
show_row_dend=FALSE,
#row_title="Transcript", #overridden by 'split' it seems
row_title_side="left",
row_title_gp=gpar(fontsize=30, fontface="bold"),
show_row_names=TRUE,
row_names_side="left",
row_title_rot=0,

#Column annotation configuratiions
cluster_columns=TRUE,
show_column_dend=TRUE,
column_title="Samples",
column_title_side="top",
column_title_gp=gpar(fontsize=15, fontface="bold"),
column_title_rot=0,
show_column_names=TRUE,

#Dendrogram configurations: columns
#clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
clustering_method_columns="complete",
column_dend_height=unit(10,"mm"),

#Dendrogram configurations: rows
clustering_distance_rows="euclidean",
clustering_method_rows="ward.D2",
row_dend_width=unit(30,"mm"))

#Annotations (row annotation must be added with 'draw' function, below)
#top_annotation_height=unit(0.5,"cm"),
#top_annotation=ColAnn)

#bottom_annotation_height=unit(3, "cm"),
#bottom_annotation=boxAnnCol)

draw(hmap + RowAnn , heatmap_legend_side="left", annotation_legend_side="right")

Error in .local(object, ...) :
Gene family: cannot map colors to some of the levels:
Activation of IRF by Cytosolic Pattern Recognition Receptor

Error when drawing annotation 'Gene family'
Error in .local(object, ...) : Error in .local(object, ...) :
Gene family: cannot map colors to some of the levels:
Activation of IRF by Cytosolic Pattern Recognition Receptors


Im not sure what is going wrong with the color defined

1
Entering edit mode
colours <- list("Gene family"=
c("Interferon Signaling"="red","Communication between Innate and Adaptive Immune Cells"="red1","Atherosclerosis Signaling "="red2",
"Activation of IRF by Cytosolic Pattern Recognition Receptors
"="azure","Neuroinflammation Signaling Pathway
"="royalblue","Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses
"="royalblue1","Role of Macrophages"="royalblue2","Death Receptor Signaling"="royalblue3","TREM1 Signaling
"="royalblue4","Toll-like Receptor Signaling
"="cyan1","NF-κB Signaling"="cyan2","HMGB1 Signaling
"="cyan3","PKCθ Signaling in T Lymphocytes"="cyan4","PPARα/RXRα Activation"="green4" ))


Hello friend, the problem is most likely in the line above. Can you double-check that all gene family names are correct, including upper- and lower-case

1
Entering edit mode

okay let me see that again

0
Entering edit mode

Im getting something like this

Since I think gene name was kind of cluttering it i removed but still its kind of messed up any suggestion

0
Entering edit mode

My next edcated guess is that the problem is with your hyphens. For example, "NF-κB Signaling" will have to be changed to "NF κB Signaling".

0
Entering edit mode

and how do you decide the sequence break may be i m doing something wrong because the number of family is around 15 and in my figure I can't see the z score bar , as well

1
Entering edit mode

It is trial and error. You can try myBreaks <- seq(-2,2, length.out=100) or myBreaks <- seq(-1,1, length.out=100), or something else. Your data does look strange (very flat).

Keep in mind that you do not necessarily have to scale the data and set break-points. Also remember that the colouring is purely for visualisation and does not change the actual clustering.

0
Entering edit mode

yes colouring to distinguish, why the z score map can't be seen is it because of the text which is taking all the space on the left side...

0
Entering edit mode

Hi Kevin,

I would like to ask you something. How can I add FPKM values of each gene in each sample into heatmap?

0
Entering edit mode

Hello again. Do you mean to literally add the numerical FPKM values to the heatmap?

0
Entering edit mode

Hi Kevin,

I figured out how to add FPKM values in cells of heatmap. But I also need to add a box plot of FPKM values in addition to z-score box plot. I tried but I could not see another legend option.

1
Entering edit mode

To do that, I think that you just create a HeatmapAnnotation and specify 2 boxplots in it, like this:

annotBoxplots <- HeatmapAnnotation(anno_boxplot(zscores, which = "row"), anno_boxplot(fpkm, which="row"), which="row", ...)

0
Entering edit mode

Hi Kevin,

I run the command below;

annotBoxplots <- HeatmapAnnotation(as.matrix(shorteffec.fpkm.txt), which="row")


I was wondering how to show only FPKM box plot (without annotation), not z-score boxplot.

1
Entering edit mode

Oh, maybe try this:

annotBoxplots < HeatmapAnnotation(anno_boxplot(as.matrix(shorteffec.fpkm.txt)), which="row")

1
Entering edit mode

Hi Kevin,

I was able to show FPKM box plot of rows (genes) and columns (samples/conditions) in the heatmap without annotation.

0
Entering edit mode

Hi Kevin,

I was able to produce annotated box plot based on FPKM values in the heatmap.

I mean this is the command that I added to Heatmap function and it put FPKM values in cell in the heatmap:

cell_fun = function(j, i, x, y, width, height, fill) {grid.text(sprintf("%.1f", shorteffec.fpkm.txt[i, j]), x, y, gp = gpar(fontsize = 15, col= "black"))}


What I need to do is to show box plot of FPKM values without any annotation. and not to use z-score legend in the heatmap.

0
Entering edit mode

Sorry Kevin, by mingling your code and complexheatmap option to keep genes in same order in two heat maps, I have this heat map. Now, how I can make the right heat map with more smooth coloring, I mean left heat map is darker and right one is higher.

library(ComplexHeatmap)
library(circlize)
mycol <- colorRamp2(c(-2,0,2), c("dodgerblue", "black", "yellow"))
> heat <- t(scale(t(norm_h0_t_r)))
> heat <- heat[apply(heat, MARGIN = 1, FUN = function(x) sd(x) != 0),]
> View(heat)
> t=heat[,1:2]
> r=heat[,3:4]
> dim(t)
[1] 8587    2
> dim(r)
[1] 8587    2
> Heatmap(t, col=mycol, cluster_columns = FALSE) + Heatmap(r, col=mycol, cluster_columns = FALSE)


May be same scale on both heat map

![enter image description here][1]

0
Entering edit mode

0
Entering edit mode

I have replied back in the other thread in order to maintain consistency: C: Why can't I reproduce the same heat map

0
Entering edit mode

How to deal with these?

0
Entering edit mode

Hi Kevin,

I am trying to generate heatmap, but I am having difficulties. As you remember from my previous heatmap based on FPKM post, I need to use those data and I need to load data into R, scale, and heatmap.

Could you please send me an R code to do those steps?

2
Entering edit mode
4.0 years ago

Take a look at the heatmap.3 and ComplexHeatmap packages to mark your genes.