Question: Phylogenetic Tree Comparison
1
8.2 years ago by
Mehdi10
Tehran, Iran
Mehdi10 wrote:

I asked similar question in stackoverflow, I didn't get good answer there, So I ask it here.

I developed new algorithm for phylogenetic tree comparison(phylogenetic tree is simply rooted binary tree). As an input we have two trees, we want to calculate their similarity percentage. one example of these type of algorithms is here.

But most of these algorithms(as I know all of them) did not offer a good way to check the accuracy of their algorithms. e.g if you look at the following figure, you can see there is more similarity between T1 and T3, versus T1 and T2.

I need a method for checking its accuracy of similarity measure, To be sure that my algorithm is better than previous algorithms !!! (it is not difficult in most of the case by human eyes but I don't know how to extend it to my application)

your validity measure should be independent from algorithm.

modified 2.0 years ago by Biostar ♦♦ 20 • written 8.2 years ago by Mehdi10
5

Sorry, I guess that doesn't work that way. There is no accuracy per-se for distance measures (of trees) unless you define it, and then I think it doesn't make much sense to define it intrinsically. Given two distance or similarity measures e.g. 'Euclidean' and 'Manhattan' (yes not for trees I know, but the prob is the same), how would you define that euclidean is better than manhattan distance or vice versa. One way to get around might be to define it in the sense of being more meaningful biologically, then you need some good example.

Are you looking for 'tree edit distance'? Then this question on Stackoverflow might contain what you are looking for.

4
8.2 years ago by
lh332k
United States
lh332k wrote:

A useful distance for measuring LGT is the SPR distance. However, computing SPR is NP-hard. More frequently we use the Robinson-Foulds distance (RF). It is perhaps more suitable for evaluating tree-building algorithms. Both distances are originally defined on unrooted trees, but it is trivial to define them on rooted trees as well.

In your case, the SPR distance for both T1-T2 and T1-T3 equals 1. I forget how to compute RF exactly (you may google it), but for sure under RF, T1 is closer to T3 than to T2.