Question

Identification Of Significant Differences Between Phylogentic Trees Using Distance Matrix

6

Entering edit mode

11.4 years ago

ALchEmiXt ★ 1.9k

I am struggling with a fairly simple problem but I cannot find a solution to it yet.

I have a distance matrix calculated for some taxa. When generating a phylogenetic tree (let's say a NJ tree) it is quite easy to spot differences between trees. However, this becomes fairly complex and undo-able for large trees (over 20 or more taxa). Isn't there a way to identify significant differences between the trees (and/or using the sole matrix by itself)?

I have found the obvious Mantel (and derived) tests but that is basically to test for a linear correlation (Pearson) between the complete matrix. That is useful but not what I am looking for.

These are two example distance trees below;

[1] 'A'     0 0.136170479249233 0.111979752530934 0.306703146374829 0.284662370068735
[2] 'B'     0.136170479249233 0 0.144810467774404 0.333361262783542 0.381779151560938
[3] 'C'     0.111979752530934 0.144810467774404 0 0.346534459925035 0.303132688524432
[4] 'D'     0.306703146374829 0.333361262783542 0.346534459925035 0 0.271365855221696
[5] 'E'     0.284662370068735 0.381779151560938 0.303132688524432 0.271365855221696 0

[1] 'A'     0 0.136170479249233 0.111979752530934 0.306703146374829 0.284662370068735
[2] 'B'     0.136170479249233 0 0.144810467774404 0.333361262783542 0.131779151560938
[3] 'C'     0.111979752530934 0.144810467774404 0 0.346534459925035 0.303132688524432
[4] 'D'     0.306703146374829 0.333361262783542 0.346534459925035 0 0.271365855221696
[5] 'E'     0.284662370068735 0.131779151560938 0.303132688524432 0.271365855221696 0

Thanks for any pointer! Alex

phylogenetics • 5.3k views

ADD COMMENT • link updated 7.8 years ago by Biostar 20 • written 11.4 years ago by ALchEmiXt ★ 1.9k

score 7 · Answer 1 · 2012-12-12

I recommend Compare2Trees webtool published in Bioinformatics.

It measures an agreement in topology of two trees by matching up branches that have similar topological characteristic. It takes as input two phylogenies in Newick format and highlights regions where the two tree topologies differ and gives you an overall topological score.

UPDATE (01.23.2017):

In 2017 I would recommend ETE Toolkit. It computes topological distances between trees by using 3 types of measures:

Robinson-Foulds symmetric difference
Percentage of edge similarity (number of branches in one tree that are present in another)
Duplication aware distances (TreeKO method), which provides a distance between trees containing duplicated attributes.

The examples of using ETE are show here.

score 4 · Answer 2 · 2012-12-12

In phylogenetics, the most widely used method to compare tree topology is probably the Robinson-Foulds distance (RF distance). It counts the number of bipartitions shared between two trees. This distance is simple, well defined and very easy to implement. In my understanding, Compare2Trees uses a related but not identical measurement. The best tree distance is subtree prune and regraft distance (SPR distance). It counts the minimum number of prune-regraft operations to transform one tree to the other. Computing SPR is known to be NP-hard.

As to implementations, my njtree (Linux binary seems working, but Mac binary is broken) is able to compute RF distance. It is easy to write one by yourself. If you google "subtree prune and regraft distance", you will find quite a few papers that approximate SPR. You may try them out. I have not.

As to the visualization of differences between two trees, I have a method described in my thesis (Section 3.5). Flnjtree (available in the same download page) provides a graphical interface. It will reorder leaves to a similar order such that you can easily identify differences by eyes. Note that this method is not just used to compare two trees, it is a general method to order tree leaves. Some may notice that in treefam, tree leaves tend to be ordered in the same way. That is the result of tree ordering algorithm.

score 0 · Answer 3 · 2012-12-12

There are a lot of statistical methods for comparing trees directly, e.g. by asking whether the likelihood of the data given one topology is significantly worse than the likelihood given another topology. These methods are reviewed (very well!) in this paper:

Nick Goldman, Jon P. Anderson, and Allen G. Rodrigo Likelihood-Based Tests of Topologies in Phylogenetics Syst Biol (2000) 49(4): 652-670 doi:10.1080/106351500750049752 http://sysbio.oxfordjournals.org/content/49/4/652.abstract

I think CONSEL implements some of the tests, as does PAML. Not sure if all tests are implemented in easy-to-use software though.

score 0 · Answer 4 · 2012-12-13

Hi,
for NJ trees this is likely to be overkill, but I have used the the Shimodaira–Hasegawa test for topology (Shimodaira and Hasegawa, 1999). It is easy to run this in RAxML.
Paup* also allows for "hypothesis testing". Just google paup* and hypothesis testing, you should find several tutorials etc... Good luck