I have a maximum likelihood tree inferred from DNA sequences and I don't know how to interpret the scale in terms of sequence divergence. I know that a given length represents a number of nucleotide substitutions per site. On my tree, 1 cm is equal to 0.1 nucleotide substitutions per site. As the sum of branch lengths between some terminal nodes is 10 times the scale; does it mean that their divergence is 10 x 0.1 = 1 nucleotide substitution per site, i.e. 100% divergence?
You are right on the sense that branch length represent residue substitution per site, either DNA or AA, but it represents the average, that means, that in average you expect a change for each site but in reality you have a broad ranges of substitutions across sites, I mean, from sites with no substitution at all to sites with just one substitution to sites with multiple substitutions.
Hope this makes sense to you :)
I think you are trying to calculate the patristic distance between two leaves. This can be done automatically using the program PATRISTIC or in R using APE as follows:
library(ape) tree<-read.tree("/file.tre") PatristicDistMatrix<-cophenetic.phylo(tree)
When interpreting divergence times be aware that they are estimates that are corrected for unobserved changes based on a model of sequence evolution, and thus estimated divergence is not strictly equal to observed sequence differences, i.e. 100% estimated divergence != 100 of sites that differ. This is because (ignoring alignment artifacts) as time -> infinity, observed divergence maximizes at 1-(inverse of the alphabet size) (e.g. 3/4 for DNA and 19/20 for proteins) because of random sequence similarity.