Question

Phylogenetic Tree Comparison

1

Entering edit mode

13.5 years ago

Mehdi ▴ 10

I asked similar question in stackoverflow, I didn't get good answer there, So I ask it here.

I developed new algorithm for phylogenetic tree comparison(phylogenetic tree is simply rooted binary tree). As an input we have two trees, we want to calculate their similarity percentage. one example of these type of algorithms is here.

But most of these algorithms(as I know all of them) did not offer a good way to check the accuracy of their algorithms. e.g if you look at the following figure, you can see there is more similarity between T1 and T3, versus T1 and T2.

examples of three phylogenetic trees

I need a method for checking its accuracy of similarity measure, To be sure that my algorithm is better than previous algorithms !!! (it is not difficult in most of the case by human eyes but I don't know how to extend it to my application)

your validity measure should be independent from algorithm.

phylogenetics graphs similarity • 5.9k views

ADD COMMENT • link updated 7.3 years ago by Biostar 20 • written 13.5 years ago by Mehdi ▴ 10

5

Entering edit mode

Sorry, I guess that doesn't work that way. There is no accuracy per-se for distance measures (of trees) unless you define it, and then I think it doesn't make much sense to define it intrinsically. Given two distance or similarity measures e.g. 'Euclidean' and 'Manhattan' (yes not for trees I know, but the prob is the same), how would you define that euclidean is better than manhattan distance or vice versa. One way to get around might be to define it in the sense of being more meaningful biologically, then you need some good example.

ADD REPLY • link 13.4 years ago by Michael 55k

0

Entering edit mode

Are you looking for 'tree edit distance'? Then this question on Stackoverflow might contain what you are looking for.

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 13.4 years ago by Michael 55k

score 4 · Answer 1 · 2012-01-25

A useful distance for measuring LGT is the SPR distance. However, computing SPR is NP-hard. More frequently we use the Robinson-Foulds distance (RF). It is perhaps more suitable for evaluating tree-building algorithms. Both distances are originally defined on unrooted trees, but it is trivial to define them on rooted trees as well.

In your case, the SPR distance for both T1-T2 and T1-T3 equals 1. I forget how to compute RF exactly (you may google it), but for sure under RF, T1 is closer to T3 than to T2.