Improve Large Heatmap Generation in R
Entering edit mode
4.1 years ago
Thorerges ▴ 10

I have made an unimpressively large heatmap in R. The rows correspond to RPKM expression values from GTEx, columns are their tissues.

> heatmap(matrix, Rowv=NA, Colv=NA, col = heat.colors(256), cexRow = 0.5, scale="column")
> dim(matrix) #dimension of my matrix
[1] 18101    53

Does anyone have any ideas as to how to make this more interpretable/cleaner? Or is this kind of data too large for a heatmap? The matrix is

enter image description here

RNA-Seq R Coding • 1.7k views
Entering edit mode
  • If possible use log transformed values of matrix to generate the heatmap. Also try different clustering methods (if it is appropriate to your data).
  • Include breaks and break custom colors. Example: Chek this post
  • If there are extreme outliers in your data, mark those using different colors using breaks
Entering edit mode

Think about what you want to show. What's the message of the figure ? There's no point having a heatmap (or any other type of figure for that matter) that can't be legibly displayed on standard media (e.g. computer screen or paper). First, if the problem is that there are no patterns to be seen where you expect some, you could consider rescaling the values to try and increase the range that is mapped to colors. However, usually one doesn't want to show individual data points but instead make more general statements about groups so this immediately suggests clustering the data. Also there may be a better representation than the heatmap for what you want to show.

Entering edit mode
4.1 years ago
Whoknows ▴ 870

Dear sherif,

The way you use for importing genes information for heatmap is really important. We always use heatmap for differentially expressed genes with +2 /-2 fold-change or +1/-1 in log2 transformation.

So you should limit genes to those which changed significantly, as I see in your heatmap image the values for all genes are same it could be solved by calculating log2 for all values, some code like:

matrix <- log2(matrix +1)

the above code correct most high values and your could see difference in log2 scale, I have added 1 to avoid computing log2 on zero value.

You also can try pheatmap package it consist several methods for clustering which might be useful for your data.


Login before adding your answer.

Traffic: 1270 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6