Heatmap with categorical variables and with phylogenetic tree in R or Python
1
2
Entering edit mode
9.1 years ago
tlorin ▴ 360

Hi everyone! :)

I have a question and did not find any answer by personal search. I would like to make a heatmap with categorical variables (a bit like this one: heatmap-like plot, but for categorical variables), and I would like to add on the left side a phylogenetic tree (like this one: how to create a heatmap with a fixed external hierarchical cluster). The ideal would be to adapt the second one since it looks much prettier! ;)

Here is my data:

A newick-formatted phylogenetic tree, with 3 species, let's say:

((1,2),3);

A data frame:

x<-c("species 1","species 2","species 3")
y<-c("A","A","C")
z<-c("A","B","A")
df<- data.frame(x,y,z)

(with A, B and C being the categorical variables, for instance in my case presence/absence/duplicated gene).

Would you know how to do it?

Many thanks in advance!

python R categorical-data phylogeny heatmap • 8.9k views
ADD COMMENT
0
Entering edit mode

What about this answer by Obi Griffith I am using this solution whenever I need to plot a heatmap and a tree. Or are you looking for something else?

ADD REPLY
0
Entering edit mode

Thanks for your answer! Seems really useful indeed. What I do not know is how to choose the color for each category (let's say A=green, B=yellow, C=red) with the heatmap function... But it might easy and I just did not figure it out ^.^

ADD REPLY
3
Entering edit mode
9.1 years ago
tlorin ▴ 360

I figured out to do it! Here is my script for those that are interested:

#load packages
library("ape")
library(gplots)

#retrieve tree in newick format with 3 species
mytree <- read.tree("sometreewith3species.tre")
mytree_brlen <- compute.brlen(mytree, method="Grafen") #so that branches have all same length

#turn the phylo tree to a dendrogram object
hc <- as.hclust(mytree_brlen) #Compulsory step as as.dendrogram doesn't have a method for phylo objects.
dend <- as.dendrogram(hc)
plot(dend, horiz=TRUE) #check dendrogram face

#create a matrix with values of each category for each species
a<-mytree_brlen$tip
b<-c("gene1","gene2")
list<-list(a,b)
values<-c(1,2,1,1,3,2)  #some values for the categories (1=A, 2=B, 3=C)
mat <- matrix(values,nrow=3, dimnames=list) #Some random data to plot

#plot the heatmap
heatmap.2(mat, Rowv=dend, Colv=NA, dendrogram='row',col =
            colorRampPalette(c("red","green","yellow"))(3),
          sepwidth=c(0.01,0.02),sepcolor="black",colsep=1:ncol(mat),rowsep=1:nrow(mat),
          key=FALSE,trace="none",
          cexRow=2,cexCol=2,srtCol=45,
          margins=c(10,10),
          main="Gene presence, absence and duplication in three species")

#legend of heatmap
par(lend=2)           # square line ends for the color legend
legend("topright",      # location of the legend on the heatmap plot
       legend = c("gene absence", "1 copy of the gene", "2 copies"), # category labels
       col = c("red", "green", "yellow"),  # color key
       lty= 1,             # line style
       lwd = 15            # line width
)

And I don't know how to show the result but it does work ;)

ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6