Question

RNAseq heatmap.2 log2FC clustering

1

Entering edit mode

9.4 years ago

Ming Tommy Tang ★ 3.9k

Hi there,

I have a RNA-seq experiment with four samples.

control, treatment1, treatment2, treatment3. each with duplicates, I used DEseq to determined the differentially expressed genes for: treatment1 vs control, treatment2 vs control and treatment3 vs control.

treatment1 over-expresses a wild type protein while treatment2 and 3 express mutant proteins which have partial function of the WT protein.

treatment1 will cause some genes to be upregulated and downregulated. treatment2 and 3 will have some overlapping genes that are also upregulated and downregulated as treatment1 but will have some genes do not change (those mutant proteins only have partial function.

Now, I want to make a heatmap (using log2 fold change) with genes that are differentially changed with treatment1.

res1 <- nbinomTest( cds, 'control', 'treatment1' )
res2 <- nbinomTest( cds, 'control, 'treatment2' )
res3 <- nbinomTest( cds, 'control, 'treatment3' )

df<- data.frame(id=res1$id, logFC1=res1$log2FoldChange,padj1=res1$padj,
                logFC2=res2$log2FoldChange, padj2=res2$padj,logFC3=res3$log2FoldChange, padj3=res3$padj)

df<- df[abs(df$logFC1)>1 & df$padj1<0.01,]  # all the genes significantly changed with treatment1

df <- na.omit(df)
df<-  df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),]

df <- na.omit(df)   #remove NA
df<-  df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),]  #remove Inf

dim(df)
[1] 1669    7

if I plot a scatter plot of logFC1:treatment1 vs control AND logFC2:treatment2 vs control by

library(ggplot2)
ggplot(df) + geom_point(aes(x=df$logFC1, y=df$logFC2)) + geom_hline(y=c(1,-1), color='red',linetype=2)

it shows me:

It makes sense, because I pre-choosed for logFC1 with abs(logFC1)>1.

Now, if I want make a heatmap with heatmap.2:

m<- as.matrix(df[,c(2,4,6)])

> head(m)
      logFC1     logFC2     logFC3
12 -1.799917 -5.0813424 -2.6516590
23 -2.843064 -0.2089547 -0.9493254
33 -1.474999 -0.2902581 -0.1040179
51 -1.447214 -0.5684074 -0.4131914
61  1.999213  1.8802237  1.2279896
65  1.022975  0.6953835  0.8228315

colnames(m)<- c("treatment1_VS_control","treatment2_VS_control","treatment3_VS_control")

hmcols<- colorRampPalette(c("green4","green","white", "red","red4"))(256)

# cluster with the pearson distance according to this post https://www.biostars.org/p/91978/#120843
library(gplots)
heatmap.2(m, col=hmcols, scale="row", hclust=function(x) hclust(x, method='complete'), distfun=function(x) as.dist(1-cor(t(x))), trace="none",margin=c(17,15), density.info="none", labRow=NA)

I can also change the method to "single", scale to "none", but I still did not get what I expected. I should see group of genes are both up-regulated with treatment1 and treatment2, treatment1 and treatment3, treatment1 2 and 3, genes only upregulated in treatment1 but not 2 and 3.

if I use the default euclidean distance which also have problems http://liorpachter.wordpress.com/2014/01/19/why-do-you-look-at-the-speck-in-your-sisters-quilt-plot-and-pay-no-attention-to-the-plank-in-your-own-heat-map/

heatmap.2(m, col=hmcols, scale="none", hclust=function(x) hclust(x, method='complete'), distfun=function(x) dist(x,method='euclidean'), trace="none",margin=c(17,15), density.info="none", labRow=NA)

It looks even worse:

How can I cluster the genes as what I want?

Thank you so much

heatmap RNA-Seq • 11k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

Why don't you make a heatmap of RPKM/FPKM or normalized count data for your set of genes.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

I can also make a heatmap with variance stabilized counts, or log2 transformed library size normalized counts, but the problem will be same for me.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

I guess I don't understand, why do you think they look bad? I see everything you want in the first heat map. I mean you could always order them yourself and turn off reordering via ROWV=FALSE, otherwise you're saying that you just don't like the way it looks. OK, but so what? It's an unsupervised method.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 7.2 years ago by mforde84 ★ 1.4k

Ram · Answer 1 · 2014-11-23

You could also try to make ecdf plot, where you will have 3 lines for each treatment.

ecdf (log2(FChange-T1), do.points=F, verticlas=T)
lines (log2(FChange-T2), do.points=F, vertical=T, col="red")
lines (log2(FChange-T3), do.points=F, vertical=T, col="blue")

This will clearly give you which treatment is more responsive, though won't give the picture about treatment on individual gene.

score 0 · Answer 2 · 2017-02-06

0

Entering edit mode

7.2 years ago

TriS ★ 4.7k

scale your FPKM data before passing them to the heatmap.

otherwise I like pheatmap

m <- myFPKMdata 
pheatmap::pheatmap(m,scale="row")

ADD COMMENT • link 7.2 years ago by TriS ★ 4.7k