How to compare my phylogenetic trees ??
3
0
Entering edit mode
7.3 years ago

I need to know how to compare 2 trees together in the same time by trusted software in steps

Did anyone try this method ?? because i can't reach by software documentations

phylogenetics • 7.2k views
0
Entering edit mode

This is a follow up question: How to optimize Matrix by Differential evolution algorithm?? that might explain where OP is going. However, I am still struggling with finding out what the aim is.

3
Entering edit mode
7.3 years ago

If you use a Maximum Likelihood method, you will get a "score" of how good the "best" tree is. For example, the best tree might have a Likelihood score of -2000.0. This tree (T0) might group (A,B) together, (C,D) together, with (E) as an outgroup. Therefore the treefile, without branchlengths, would look like this: ((A,B),(C,D), E);

You could ask if another tree (T1) was significantly worse than the "best" tree. For example, consider tree ((A,C),(B,D),E);. This will have a poorer Likelihood (simply because it is not the "best" tree) so maybe something like -2010.0.

We can do a statistical test to see if Tree T1 is significantly worse than Tree T0 i.e. is Likelihood difference of 10 units significantly greater than a difference of zero.. We can use the Shimodaira-Hasegawa test to do this e.g. with the package IQ-TREE, easily accessed online via the W-IQ-TREE server. You would provide the alignment and at the bottom of the webpage, provide the 2 treefiles concatenated:

((A,B),(C,D), E);

((A,C),(B,D),E);

If you compare more than 2 trees, the "approximately unbiased" test (UA) is perhaps better. This can also be done using IQ-TREE.

If you HAVE NOT done a phylogenetic analysis - and thus do not know the "best" tree in advance - then you can provide an alignment and a concatenated list of treefiles and then carry out the Hasegawa-Kishino test. This HK statistical test is a two-sided test, unlike SH and UA which are one-sided statistical tests. Again available in IQ-TREE.

These tests can be done in other Maximum Likelihood packages (e.g. RAxMl, PhyML). IQ-TREE is just an example.

0
Entering edit mode

thanks for your help , but i am not using Maximum likelihood , i am using distance method instead , and i have 2 trees files if this method(distance method) will effect on my results (make it worse) ?? or it effect on ML only ??

deeply ... can i use this tests with any Method for phylogenetic tree construction ??

1
Entering edit mode

Maximum Likelihood (and Bayesian Methods) are better than distance methods. When we create a distance matrix, we discard information. We also lose the link with the psoitioj in the sequence and are unable to model between-site rate heterogeneity, which means that gap regions an introduce bias in the pairwise distance estimates. The distance approach was popular in the 1990s when maximum Likelihood methods were computationally very slow. This paper highlighted the need to move on to newer methods in 2001: http://www.sciencedirect.com/science/article/pii/S0168952501022727

Re-reading your question again, you seem more interested in the pairwise distances than the tree and there is a suggestion you want a test on each pairwise distance (that connects sequences from 2 species) to test whether the length is long enough to say that the two species are distinct. There are distance methods that generate a distance and a standard error for the distance, but two distinct species can have a zero pairwise distance based in a highly conserved gene or alternatively a large pairwise distance based on a fast evolving region. I think you should think much more about the relationships seen in the phylogenetic tree rather than the pairwise differences.

If you feel you want to stick with the older distance-based approach, you could compare the trees using the FITCH program in the PHYLIP package. This won't test if the trees are statistically different but will give a score for each tree. Set the U option to use user-supplied trees rather than estimate the "best" tree. Then supply a treefile with the 2 treefiles (for tree T0 and tree T1). You will need to insert a line above the 2 trees with a "2" on it to tell the package that there are 2 trees to be tested.

I strongly urge you to read the Whelan, Lio and Goldman (2001) paper that I quote above and thus move to modern Maximum Likelihood methods or Bayesian Inference methods.

1
Entering edit mode

UPDATE

Comparing distance-based trees: The PHYLIP FITCH program won't do a statistical test but it will give a "score" for each tree, a "sum of squares" which may be all you need e.g. "Sum of squares = 0.16602"

This optimality criteria (for a distance method) is produced by comparing the OBSERVED distance matrix (i.e. the distance matrix you estimate from your alignment ) with an EXPECTED distance matrix based on the tree. This comes from using a "Least Squares" distance method (Fitch-Margoliash) which searches for the best tree. The best tree will have the lowest Sums of Squares.

Here is a typical output when comparing 2 trees using PHYLIP FITCH program (U option). The better tree is the one with a SS of 0.16602.

EXAMPLE

(Gamma:0.37848,(Beta:0.00000,Zeta:0.94553):0.00000,Alpha:0.31710);

Sum of squares = 0.19074

Zeta:0.91521,(Gamma:0.37582,Beta:0.00000):0.08002,Alpha:0.24372);

Sum of squares = 0.16602

0
Entering edit mode
7.3 years ago
kloetzl ★ 1.1k

Try phylip treedist.

0
Entering edit mode

yes i know that option but is it help me to know which tree is better ??

0
Entering edit mode

“better” is a vague word. You need a criterion to define, which tree is better. That criterion could be "maximum parsimony" or "Maximum likelihood". You probably want to use the latter.

0
Entering edit mode

i have my start tree and my optimal tree result from my optimization algorithm i need to support my optimal tree in results that say it is better than my start tree so, i need to pass them together to software and see difference by results.. which software can do that ??

0
Entering edit mode

i forgot to tell you , i am using Distance Method criterion

0
Entering edit mode

What is your criteria for 'better'?

Shorter branch lengths? More distinct clustering?

You can't compare them if you don't know what you're comparing...

0
Entering edit mode
7.3 years ago
Michael 54k

A common way for testing is the bootstrap method. It is implemented in software like RAxML and PhyML. Bootstrap replication is a validation step that needs to be carried out based on the MSA after an initial tree has been built. Normal is minimum of 100 bootstrap replicates. In this context unfortunately 2 trees is like no tree. ;) And in general you need to either use Maximum likelihood or Bayesian trees, maybe mixing in maximum parsimony.

0
Entering edit mode

what do you mean of 2 trees is like no tree ??

can i compute bootstrap for my start tree and my optimal tree ... and see the difference ??

0
Entering edit mode

What I mean is 2 trees is not a basis for a proper analysis. You need 100, so that a consensus tree makes sense. Can you test your trees? Yes, but you need the original MSA as input + the tree. You can then test the tree by different criteria, e.g. Likelihood. MrBayes also has an option to test a given tree topology as a hypotheses. MEGA 7 als offers and option to test an existing tree by Max. Likelihood, Max. Parsimony, and least squares. However, testing a simple NJ or UPGMA tree does not make much sense. These test are mostly intended for testing an existing hypothesis that can be drawn as a tree, like monophyly of a group, or that A and B are sister taxa.

0
Entering edit mode

So, what is (Distance method) metrics to support my optimal tree ??

0
Entering edit mode

I have the feeling you do not understand what I am trying to tell you.

• Bootstrap method is a common method to validate trees.
• You need to use a software that supports Bootstrapping.
• Distance methods (NJ/UPGMA) are not a good methods to generate reliable trees.
• It cannot be decided which of your trees is better, because you have not defined what better means.
0
Entering edit mode

Ok, i will use bootstrapping as a metric of my comparison ... my questions now is :

• Did you mean that i need to use MSA + 100 trees as input to compute bootstrap percentage ??
• what will be my wrong if i input MSA + 2 trees (original tree+optimal tree) ??
• i need to compare my results with recent paper that used the same criterion (optimization+bootstrapping) or in your opinion which paper can i compare my results ??
0
Entering edit mode

0
Entering edit mode

• Please convince me that your tree is optimal, why do you think one of your trees is optimal? If you already knew it was optimal for some reason, then you wouldn't need to validate. If you have a well known Species Tree (your 'optimal tree') you could try also TreeBest software.
• If you have - instead - generated simple NJ tree or UPGMA trees, then it is better to rebuild a tree from scratch using the MSA as input and either MEGA, PhyML, RaXML or MrBayes.