Identification Of Significant Differences Between Phylogentic Trees Using Distance Matrix
4
6
Entering edit mode
10.8 years ago
ALchEmiXt ★ 1.9k

I am struggling with a fairly simple problem but I cannot find a solution to it yet.

I have a distance matrix calculated for some taxa. When generating a phylogenetic tree (let's say a NJ tree) it is quite easy to spot differences between trees. However, this becomes fairly complex and undo-able for large trees (over 20 or more taxa). Isn't there a way to identify significant differences between the trees (and/or using the sole matrix by itself)?

I have found the obvious Mantel (and derived) tests but that is basically to test for a linear correlation (Pearson) between the complete matrix. That is useful but not what I am looking for.

These are two example distance trees below;

[1] 'A'     0 0.136170479249233 0.111979752530934 0.306703146374829 0.284662370068735
[2] 'B'     0.136170479249233 0 0.144810467774404 0.333361262783542 0.381779151560938
[3] 'C'     0.111979752530934 0.144810467774404 0 0.346534459925035 0.303132688524432
[4] 'D'     0.306703146374829 0.333361262783542 0.346534459925035 0 0.271365855221696
[5] 'E'     0.284662370068735 0.381779151560938 0.303132688524432 0.271365855221696 0

[1] 'A'     0 0.136170479249233 0.111979752530934 0.306703146374829 0.284662370068735
[2] 'B'     0.136170479249233 0 0.144810467774404 0.333361262783542 0.131779151560938
[3] 'C'     0.111979752530934 0.144810467774404 0 0.346534459925035 0.303132688524432
[4] 'D'     0.306703146374829 0.333361262783542 0.346534459925035 0 0.271365855221696
[5] 'E'     0.284662370068735 0.131779151560938 0.303132688524432 0.271365855221696 0


Thanks for any pointer! Alex

phylogenetics • 5.1k views
7
Entering edit mode
10.8 years ago

I recommend Compare2Trees webtool published in Bioinformatics.

It measures an agreement in topology of two trees by matching up branches that have similar topological characteristic. It takes as input two phylogenies in Newick format and highlights regions where the two tree topologies differ and gives you an overall topological score.

UPDATE (01.23.2017):

In 2017 I would recommend ETE Toolkit. It computes topological distances between trees by using 3 types of measures:

1. Robinson-Foulds symmetric difference
2. Percentage of edge similarity (number of branches in one tree that are present in another)
3. Duplication aware distances (TreeKO method), which provides a distance between trees containing duplicated attributes.

The examples of using ETE are show here.

0
Entering edit mode

it seems to be doing what to expect. Though its all visual and I have to see if it will satisfy all my needs....Again it seems trivial to use the proper key words in finding what you need.

4
Entering edit mode
10.8 years ago
lh3 33k

In phylogenetics, the most widely used method to compare tree topology is probably the Robinson-Foulds distance (RF distance). It counts the number of bipartitions shared between two trees. This distance is simple, well defined and very easy to implement. In my understanding, Compare2Trees uses a related but not identical measurement. The best tree distance is subtree prune and regraft distance (SPR distance). It counts the minimum number of prune-regraft operations to transform one tree to the other. Computing SPR is known to be NP-hard.

As to implementations, my njtree (Linux binary seems working, but Mac binary is broken) is able to compute RF distance. It is easy to write one by yourself. If you google "subtree prune and regraft distance", you will find quite a few papers that approximate SPR. You may try them out. I have not.

As to the visualization of differences between two trees, I have a method described in my thesis (Section 3.5). Flnjtree (available in the same download page) provides a graphical interface. It will reorder leaves to a similar order such that you can easily identify differences by eyes. Note that this method is not just used to compare two trees, it is a general method to order tree leaves. Some may notice that in treefam, tree leaves tend to be ordered in the same way. That is the result of tree ordering algorithm.

0
Entering edit mode

Interesting! I will check it out and let you know.

0
Entering edit mode

Anyone interested in calculating SPR distances may be interested in the work of Chris Whidden and Rob Beiko. They have a rooted SPR distance finding algorithm that uses some fancy agreement forest and clustering techniques: http://kiwi.cs.dal.ca/Software/RSPR

It has significant performance gains in terms of speed and the number of taxa you can have in the tree compared to other algorithms.

0
Entering edit mode
10.8 years ago
rob.lanfear ▴ 30

There are a lot of statistical methods for comparing trees directly, e.g. by asking whether the likelihood of the data given one topology is significantly worse than the likelihood given another topology. These methods are reviewed (very well!) in this paper:

Nick Goldman, Jon P. Anderson, and Allen G. Rodrigo Likelihood-Based Tests of Topologies in Phylogenetics Syst Biol (2000) 49(4): 652-670 doi:10.1080/106351500750049752 http://sysbio.oxfordjournals.org/content/49/4/652.abstract

I think CONSEL implements some of the tests, as does PAML. Not sure if all tests are implemented in easy-to-use software though.

0
Entering edit mode
10.8 years ago
Whetting ★ 1.6k

Hi,
for NJ trees this is likely to be overkill, but I have used the the Shimodairaâ€“Hasegawa test for topology (Shimodaira and Hasegawa, 1999). It is easy to run this in RAxML.
Paup* also allows for "hypothesis testing". Just google paup* and hypothesis testing, you should find several tutorials etc... Good luck