I have a RNA-seq experiment with four samples.
control, treatment1, treatment2, treatment3. each with duplicates, I used DEseq to determined the differentially expressed genes for: treatment1 vs control, treatment2 vs control and treatment3 vs control.
treatment1 over-expresses a wild type protein while treatment2 and 3 express mutant proteins which have partial function of the WT protein.
treatment1 will cause some genes to be upregulated and downregulated. treatment2 and 3 will have some overlapping genes that are also upregulated and downregulated as treatment1 but will have some genes do not change (those mutant proteins only have partial function.
Now, I want to make a heatmap (using log2 fold change) with genes that are differentially changed with treatment1.
res1 <- nbinomTest( cds, 'control', 'treatment1' ) res2 <- nbinomTest( cds, 'control, 'treatment2' ) res3 <- nbinomTest( cds, 'control, 'treatment3' ) df<- data.frame(id=res1$id, logFC1=res1$log2FoldChange,padj1=res1$padj, logFC2=res2$log2FoldChange, padj2=res2$padj,logFC3=res3$log2FoldChange, padj3=res3$padj) df<- df[abs(df$logFC1)>1 & df$padj1<0.01,] # all the genes significantly changed with treatment1 df <- na.omit(df) df<- df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),] df <- na.omit(df) #remove NA df<- df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),] #remove Inf dim(df)  1669 7
if I plot a scatter plot of : logFC1: treatment1 vs control AND logFC2:treatment2 vs control by
library(ggplot2) ggplot(df) + geom_point(aes(x=df$logFC1, y=df$logFC2)) + geom_hline(y=c(1,-1), color='red',linetype=2)
it shows me:
it makes sense, because I pre-choosed for logFC1 with abs(logFC1)>1.
Now, if I want make a heatmap with heatmap.2:
> head(m) logFC1 logFC2 logFC3 12 -1.799917 -5.0813424 -2.6516590 23 -2.843064 -0.2089547 -0.9493254 33 -1.474999 -0.2902581 -0.1040179 51 -1.447214 -0.5684074 -0.4131914 61 1.999213 1.8802237 1.2279896 65 1.022975 0.6953835 0.8228315 colnames(m)<- c("treatment1_VS_control","treatment2_VS_control","treatment3_VS_control") hmcols<- colorRampPalette(c("green4","green","white", "red","red4"))(256)
# cluster with the pearson distance according to this post C: When Using Heatmap.2 From R To Make A Heatmap Of Microarray Data, How Are The Ge
library(gplots) heatmap.2(m, col=hmcols, scale="row", hclust=function(x) hclust(x, method='complete'), distfun=function(x) as.dist(1-cor(t(x))), trace="none",margin=c(17,15), density.info="none", labRow=NA)
# I can also change the method to "single", scale="none", but I still did not get what I expected. I should see group of genes are both up-regulated with treatment1 and treatment2, treatment1 and treatment3, treatment1 2 and 3, genes only upregulated in treatment1 but not 2 and 3.
if I use the default euclidean distance which also have problems http://liorpachter.wordpress.com/2014/01/19/why-do-you-look-at-the-speck-in-your-sisters-quilt-plot-and-pay-no-attention-to-the-plank-in-your-own-heat-map/
heatmap.2(m, col=hmcols, scale="none", hclust=function(x) hclust(x, method='complete'), distfun=function(x) dist(x,method='euclidean'), trace="none",margin=c(17,15), density.info="none", labRow=NA)
it looks even worse:
How can I cluster the genes as what I want?
Thank you so much