Question

Improve Large Heatmap Generation in R

0

Entering edit mode

6.9 years ago

Thorerges ▴ 10

I have made an unimpressively large heatmap in R. The rows correspond to RPKM expression values from GTEx, columns are their tissues.

> heatmap(matrix, Rowv=NA, Colv=NA, col = heat.colors(256), cexRow = 0.5, scale="column")
> dim(matrix) #dimension of my matrix
[1] 18101    53

Does anyone have any ideas as to how to make this more interpretable/cleaner? Or is this kind of data too large for a heatmap? The matrix is

enter image description here

RNA-Seq R Coding • 2.9k views

ADD COMMENT • link updated 6.9 years ago by Whoknows ▴ 960 • written 6.9 years ago by Thorerges ▴ 10

2

Entering edit mode

If possible use log transformed values of matrix to generate the heatmap. Also try different clustering methods (if it is appropriate to your data).
Include breaks and break custom colors. Example: Chek this post
If there are extreme outliers in your data, mark those using different colors using breaks

ADD REPLY • link 6.9 years ago by EagleEye 7.5k

2

Entering edit mode

Think about what you want to show. What's the message of the figure ? There's no point having a heatmap (or any other type of figure for that matter) that can't be legibly displayed on standard media (e.g. computer screen or paper). First, if the problem is that there are no patterns to be seen where you expect some, you could consider rescaling the values to try and increase the range that is mapped to colors. However, usually one doesn't want to show individual data points but instead make more general statements about groups so this immediately suggests clustering the data. Also there may be a better representation than the heatmap for what you want to show.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

score 3 · Accepted Answer · 2017-05-23

Dear sherif,

The way you use for importing genes information for heatmap is really important. We always use heatmap for differentially expressed genes with +2 /-2 fold-change or +1/-1 in log2 transformation.

So you should limit genes to those which changed significantly, as I see in your heatmap image the values for all genes are same it could be solved by calculating log2 for all values, some code like:

matrix <- log2(matrix +1)

the above code correct most high values and your could see difference in log2 scale, I have added 1 to avoid computing log2 on zero value.

You also can try pheatmap package it consist several methods for clustering which might be useful for your data.