Question: RNAseq heatmap.2 log2FC clustering
gravatar for Ming Tang
6.0 years ago by
Ming Tang2.6k
Houston/MD Anderson Cancer Center
Ming Tang2.6k wrote:

Hi there,

I have a RNA-seq experiment with four samples. 

control, treatment1, treatment2, treatment3. each with duplicates, I used DEseq to determined the differentially  expressed genes for: treatment1 vs control, treatment2 vs control and treatment3 vs control.

treatment1 over-expresses a wild type protein while treatment2 and 3 express mutant proteins  which have partial function of the WT protein.

treatment1 will cause some genes to be upregulated and downregulated. treatment2 and 3 will have some overlapping genes that are also upregulated and downregulated as treatment1 but will have some genes do not change (those mutant proteins only have partial function.

Now, I want to make a heatmap (using log2 fold change) with genes that are differentially changed with treatment1. 

res1 <- nbinomTest( cds, 'control', 'treatment1' )
res2 <- nbinomTest( cds, 'control, 'treatment2' )
res3 <- nbinomTest( cds, 'control, 'treatment3' )

df<- data.frame(id=res1$id, logFC1=res1$log2FoldChange,padj1=res1$padj,
                logFC2=res2$log2FoldChange, padj2=res2$padj,logFC3=res3$log2FoldChange, padj3=res3$padj)

df<- df[abs(df$logFC1)>1 & df$padj1<0.01,]  # all the genes significantly changed with treatment1

df <- na.omit(df)
df<-  df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),]

df <- na.omit(df)   #remove NA
df<-  df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),]  #remove Inf

[1] 1669    7

if I plot a scatter plot of : logFC1: treatment1 vs control    AND logFC2:treatment2 vs control by

ggplot(df) + geom_point(aes(x=df$logFC1, y=df$logFC2)) + geom_hline(y=c(1,-1), color='red',linetype=2)

it shows me:


it makes sense, because I pre-choosed for logFC1 with abs(logFC1)>1.

Now, if I want make a heatmap with heatmap.2:

m<- as.matrix(df[,c(2,4,6)])
> head(m)
      logFC1     logFC2     logFC3
12 -1.799917 -5.0813424 -2.6516590
23 -2.843064 -0.2089547 -0.9493254
33 -1.474999 -0.2902581 -0.1040179
51 -1.447214 -0.5684074 -0.4131914
61  1.999213  1.8802237  1.2279896
65  1.022975  0.6953835  0.8228315

colnames(m)<- c("treatment1_VS_control","treatment2_VS_control","treatment3_VS_control")

hmcols<- colorRampPalette(c("green4","green","white", "red","red4"))(256)

# cluster with the pearson distance according to this post C: When Using Heatmap.2 From R To Make A Heatmap Of Microarray Data, How Are The Ge

heatmap.2(m, col=hmcols, scale="row", hclust=function(x) hclust(x, method='complete'), distfun=function(x) as.dist(1-cor(t(x))), trace="none",margin=c(17,15),"none", labRow=NA)


# I can also change the method to "single", scale="none", but I still  did not get what I expected. I should see group of genes are both up-regulated with treatment1 and treatment2, treatment1 and treatment3, treatment1 2 and 3, genes only upregulated in treatment1 but not 2 and 3.

 if I use the default euclidean distance which also have problems

heatmap.2(m, col=hmcols, scale="none", hclust=function(x) hclust(x, method='complete'), distfun=function(x) dist(x,method='euclidean'), trace="none",margin=c(17,15),"none", labRow=NA)

it looks even worse:


How can I cluster the genes as what I want?

Thank you so much

heatmap rna-seq • 7.8k views
ADD COMMENTlink modified 3.8 years ago by TriS4.2k • written 6.0 years ago by Ming Tang2.6k

why don't you make a heatmap of RPKM/FPKM  or normalized count data for your set of genes.


ADD REPLYlink written 6.0 years ago by Manvendra Singh2.1k

I can also make a heatmap with variance stabilized counts, or log2 transformed library size normalized counts, but the problem will be same for me. 

ADD REPLYlink written 6.0 years ago by Ming Tang2.6k

i guess i dont understand, why do you think they look bad? i see everything you want in the first heat map. I mean you could always order them yourself and turn off reordering via ROWV=FALSE, otherwise you're saying that you just don't like the way it looks. ok, but so what? it's an unsupervised method.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by mforde841.3k
gravatar for Chirag Nepal
6.0 years ago by
Chirag Nepal2.2k
Chirag Nepal2.2k wrote:

You could also try to make ecdf plot, where you will have 3 lines for each treatment.

ecdf (log2(FChange-T1), do.points=F, verticlas=T)

lines (log2(FChange-T2), do.points=F, vertical=T, col="red")

lines (log2(FChange-T3), do.points=F, vertical=T, col="blue")

This will clearly give you which treatment is more responsive, though won't give the picture about treatment on individual gene.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Chirag Nepal2.2k
gravatar for TriS
3.8 years ago by
United States, Buffalo
TriS4.2k wrote:

scale your FPKM data before passing them to the heatmap.

otherwise I like pheatmap

m <- myFPKMdata 
ADD COMMENTlink written 3.8 years ago by TriS4.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1285 users visited in the last hour