Question: Identification Of Significant Differences Between Phylogentic Trees Using Distance Matrix
6
gravatar for ALchEmiXt
7.5 years ago by
ALchEmiXt1.9k
The Netherlands
ALchEmiXt1.9k wrote:

I am struggling with a fairly simple problem but I cannot find a solution to it yet.

I have a distance matrix calculated for some taxa. When generating a phylogenetic tree (let's say a NJ tree) it is quite easy to spot differences between trees. However, this becomes fairly complex and undo-able for large trees (over 20 or more taxa). Isn't there a way to identify significant differences between the trees (and/or using the sole matrix by itself)?

I have found the obvious Mantel (and derived) tests but that is basically to test for a linear correlation (Pearson) between the complete matrix. That is useful but not what I am looking for.

These are two example distance trees below;

[1] 'A'     0 0.136170479249233 0.111979752530934 0.306703146374829 0.284662370068735
[2] 'B'     0.136170479249233 0 0.144810467774404 0.333361262783542 0.381779151560938
[3] 'C'     0.111979752530934 0.144810467774404 0 0.346534459925035 0.303132688524432
[4] 'D'     0.306703146374829 0.333361262783542 0.346534459925035 0 0.271365855221696
[5] 'E'     0.284662370068735 0.381779151560938 0.303132688524432 0.271365855221696 0

[1] 'A'     0 0.136170479249233 0.111979752530934 0.306703146374829 0.284662370068735
[2] 'B'     0.136170479249233 0 0.144810467774404 0.333361262783542 0.131779151560938
[3] 'C'     0.111979752530934 0.144810467774404 0 0.346534459925035 0.303132688524432
[4] 'D'     0.306703146374829 0.333361262783542 0.346534459925035 0 0.271365855221696
[5] 'E'     0.284662370068735 0.131779151560938 0.303132688524432 0.271365855221696 0

Thanks for any pointer! Alex

phylogenetics • 3.9k views
ADD COMMENTlink modified 3.9 years ago by Biostar ♦♦ 20 • written 7.5 years ago by ALchEmiXt1.9k
6
gravatar for a.zielezinski
7.5 years ago by
a.zielezinski9.1k
a.zielezinski9.1k wrote:

I recommend Compare2Trees webtool published in Bioinformatics.

It measures an agreement in topology of two trees by matching up branches that have similar topological characteristic. It takes as input two phylogenies in Newick format and highlights regions where the two tree topologies differ and gives you an overall topological score.

UPDATE (01.23.2017):

In 2017 I would recommend ETE Toolkit. It computes topological distances between trees by using 3 types of measures:

  1. Robinson-Foulds symmetric difference
  2. Percentage of edge similarity (number of branches in one tree that are present in another)
  3. Duplication aware distances (TreeKO method), which provides a distance between trees containing duplicated attributes.

The examples of using ETE are show here.

ADD COMMENTlink modified 3.4 years ago • written 7.5 years ago by a.zielezinski9.1k

it seems to be doing what to expect. Though its all visual and I have to see if it will satisfy all my needs....Again it seems trivial to use the proper key words in finding what you need.

ADD REPLYlink written 7.5 years ago by ALchEmiXt1.9k
4
gravatar for lh3
7.5 years ago by
lh332k
United States
lh332k wrote:

In phylogenetics, the most widely used method to compare tree topology is probably the Robinson-Foulds distance (RF distance). It counts the number of bipartitions shared between two trees. This distance is simple, well defined and very easy to implement. In my understanding, Compare2Trees uses a related but not identical measurement. The best tree distance is subtree prune and regraft distance (SPR distance). It counts the minimum number of prune-regraft operations to transform one tree to the other. Computing SPR is known to be NP-hard.

As to implementations, my njtree (Linux binary seems working, but Mac binary is broken) is able to compute RF distance. It is easy to write one by yourself. If you google "subtree prune and regraft distance", you will find quite a few papers that approximate SPR. You may try them out. I have not.

As to the visualization of differences between two trees, I have a method described in my thesis (Section 3.5). Flnjtree (available in the same download page) provides a graphical interface. It will reorder leaves to a similar order such that you can easily identify differences by eyes. Note that this method is not just used to compare two trees, it is a general method to order tree leaves. Some may notice that in treefam, tree leaves tend to be ordered in the same way. That is the result of tree ordering algorithm.

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by lh332k

Interesting! I will check it out and let you know.

ADD REPLYlink written 7.5 years ago by ALchEmiXt1.9k

Anyone interested in calculating SPR distances may be interested in the work of Chris Whidden and Rob Beiko. They have a rooted SPR distance finding algorithm that uses some fancy agreement forest and clustering techniques: http://kiwi.cs.dal.ca/Software/RSPR

It has significant performance gains in terms of speed and the number of taxa you can have in the tree compared to other algorithms.

ADD REPLYlink written 7.5 years ago by DG7.1k
0
gravatar for rob.lanfear
7.5 years ago by
rob.lanfear30
rob.lanfear30 wrote:

There are a lot of statistical methods for comparing trees directly, e.g. by asking whether the likelihood of the data given one topology is significantly worse than the likelihood given another topology. These methods are reviewed (very well!) in this paper:

Nick Goldman, Jon P. Anderson, and Allen G. Rodrigo Likelihood-Based Tests of Topologies in Phylogenetics Syst Biol (2000) 49(4): 652-670 doi:10.1080/106351500750049752 http://sysbio.oxfordjournals.org/content/49/4/652.abstract

I think CONSEL implements some of the tests, as does PAML. Not sure if all tests are implemented in easy-to-use software though.

ADD COMMENTlink written 7.5 years ago by rob.lanfear30
0
gravatar for Whetting
7.5 years ago by
Whetting1.5k
Bethesda, MD
Whetting1.5k wrote:

Hi,
for NJ trees this is likely to be overkill, but I have used the the Shimodaira–Hasegawa test for topology (Shimodaira and Hasegawa, 1999). It is easy to run this in RAxML.
Paup* also allows for "hypothesis testing". Just google paup* and hypothesis testing, you should find several tutorials etc... Good luck

ADD COMMENTlink written 7.5 years ago by Whetting1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1549 users visited in the last hour