Question

HeatMap, problems with the scale color and dendrogram

1

Entering edit mode

2.6 years ago

arturo.marin ▴ 20

Hi,

I am trying to make a Heatmap of some analyzed RNAseq data. I am using ggplot2 with R. The problem I have is that I cannot see well the upregulated and the downregulated genes. I can't get the colors shown to be dark enough. I am doing the heatmap with all the genes of the study organism, around 13,000. I'm also not sure that ggplot2 clusters the genes in my data. Is there a way to cluster the genes and see darker colors in the genes upregulated and the downregulated? The code I am using would be the following:

library(ggplot2) 
library(reshape2)

g<-read.csv("datos_prueba3.csv") g3<-melt(g)

plot1 <- ggplot(g3, aes(variable, Gene, fill=value)) + geom_tile() + scale_fill_gradient2(low="dark green", high="dark red", mid="green", midpoint=0) + theme(axis.text.y=element_blank()) 
plot1

enter image description here

Best,

HeatMap RNAseq • 2.1k views

ADD COMMENT • link 2.6 years ago by arturo.marin ▴ 20

1

Entering edit mode

2.6 years ago

Mensur Dlakic ★ 27k

This doesn't seem to be clustered. As to not being able to see it, consider the length of a sheet of paper and divide it into 13,000 parts. It would be hard to see them, which is to say that one can't really represent 13 K data point in a heat map. And even if it were possible, there aren't 13 K genes worth looking at in your experiment.

I think you need to work on reducing the number manually, or let the programs do it for you based on the magnitude of over- or under-expression. This may give you a starting point:

https://www.r-graph-gallery.com/heatmap

Also, search for heatmap in https://bioconductor.org/ and that will give you many packages to consider.

ADD COMMENT • link 2.6 years ago by Mensur Dlakic ★ 27k

score 4 · Accepted Answer · 2021-10-02

4

Entering edit mode

2.6 years ago

ATpoint 82k

Doing a heatmap of 13k genes is not going to be informative, it is just too many data. Try the following points:

a) reduce the number of genes to the most meaningful ones, e.g. the differential ones. All others are not different between days and as such do not add any information

b) standardize your data. Expression data have large spans, both if you plot log-scale expression data or fold changes. In this case the extreme values will dominate the color scale (here it is the -20 on the color gradient). Basically the extreme gradient makes it impossible to see differences, e.g. etween -5 and -10. Scaling means Z-scoring. If you have a numeric matrix of log-expression values called x then do t(scale(t(c))) to get Z-scored data which indicate for every gene the deviation from its mean of all samples.

c) Cluster your heatmap with hclust to group similar expression patterns and visualize them. The ComplexHeatmap package can do this internally. Alternatively, if you prefer ggplot style use https://github.com/XiaoLuo-boy/ggheatmap which also has parameters to use hclust.

Does that make sense to you?

ADD COMMENT • link 2.6 years ago by ATpoint 82k

0

Entering edit mode

Yes, it makes sense to me, thank you very much. I understand that there is no point in doing a heatmap with so many genes. I have a new question. I understand that it makes more sense to make a heatmap with for example only genes with a p-value < 0.05, right? The question I have now is, How to decide these genes? since the set of genes with p-value < 0.05 on day 2 will not necessarily be identical to the set of genes with p-value < 0.05 on day 5.

ADD REPLY • link 2.6 years ago by arturo.marin ▴ 20

0

Entering edit mode

Yes, that is a valid point. I usually just collect DEGs from all tested conditions, merge them and put these into the heatmaps. The goal is then to find global patterns of gene expression changes.