Question

How To Interpret The Scale On A Phylogenetic Tree?

3

Entering edit mode

12.3 years ago

Frédéric Mahé ★ 3.2k

I have a maximum likelihood tree inferred from DNA sequences and I don't know how to interpret the scale in terms of sequence divergence. I know that a given length represents a number of nucleotide substitutions per site. On my tree, 1 cm is equal to 0.1 nucleotide substitutions per site. As the sum of branch lengths between some terminal nodes is 10 times the scale; does it mean that their divergence is 10 x 0.1 = 1 nucleotide substitution per site, i.e. 100% divergence?

phylogeny • 25k views

ADD COMMENT • link updated 12.3 years ago by Casey Bergman 18k • written 12.3 years ago by Frédéric Mahé ★ 3.2k

score 4 · Answer 1 · 2012-01-31

Hi,

You are right on the sense that branch length represent residue substitution per site, either DNA or AA, but it represents the average, that means, that in average you expect a change for each site but in reality you have a broad ranges of substitutions across sites, I mean, from sites with no substitution at all to sites with just one substitution to sites with multiple substitutions.

Hope this makes sense to you :)

S

score 4 · Answer 2 · 2012-01-31

I think you are trying to calculate the patristic distance between two leaves. This can be done automatically using the program PATRISTIC or in R using APE as follows:

library(ape)
tree<-read.tree("/file.tre")
PatristicDistMatrix<-cophenetic.phylo(tree)

When interpreting divergence times be aware that they are estimates that are corrected for unobserved changes based on a model of sequence evolution, and thus estimated divergence is not strictly equal to observed sequence differences, i.e. 100% estimated divergence != 100 of sites that differ. This is because (ignoring alignment artifacts) as time -> infinity, observed divergence maximizes at 1-(inverse of the alphabet size) (e.g. 3/4 for DNA and 19/20 for proteins) because of random sequence similarity.