Question: How To Interpret The Scale On A Phylogenetic Tree?
2
gravatar for Frédéric Mahé
5.9 years ago by
Kaiserslautern, Germany
Frédéric Mahé2.7k wrote:

I have a maximum likelihood tree inferred from DNA sequences and I don't know how to interpret the scale in terms of sequence divergence. I know that a given length represents a number of nucleotide substitutions per site. On my tree, 1 cm is equal to 0.1 nucleotide substitutions per site. As the sum of branch lengths between some terminal nodes is 10 times the scale; does it mean that their divergence is 10 x 0.1 = 1 nucleotide substitution per site, i.e. 100% divergence?

phylogeny • 10k views
ADD COMMENTlink written 5.9 years ago by Frédéric Mahé2.7k
3
gravatar for scapella
5.8 years ago by
scapella370
Barcelona, Spain
scapella370 wrote:

Hi,

You are right on the sense that branch length represent residue substitution per site, either DNA or AA, but it represents the average, that means, that in average you expect a change for each site but in reality you have a broad ranges of substitutions across sites, I mean, from sites with no substitution at all to sites with just one substitution to sites with multiple substitutions.

Hope this makes sense to you :)

S

ADD COMMENTlink written 5.8 years ago by scapella370
3
gravatar for Casey Bergman
5.8 years ago by
Casey Bergman17k
Athens, GA, USA
Casey Bergman17k wrote:

I think you are trying to calculate the patristic distance between two leaves. This can be done automatically using the program PATRISTIC or in R using APE as follows:

library(ape)
tree<-read.tree("/file.tre")
PatristicDistMatrix<-cophenetic.phylo(tree)

When interpreting divergence times be aware that they are estimates that are corrected for unobserved changes based on a model of sequence evolution, and thus estimated divergence is not strictly equal to observed sequence differences, i.e. 100% estimated divergence != 100 of sites that differ. This is because (ignoring alignment artifacts) as time -> infinity, observed divergence maximizes at 1-(inverse of the alphabet size) (e.g. 3/4 for DNA and 19/20 for proteins) because of random sequence similarity.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Casey Bergman17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 605 users visited in the last hour