Question: Metrics to evaluate tree congruence
gravatar for fhsantanna
16 days ago by
fhsantanna220 wrote:

I have multiple phylogenetic trees of different marker genes. Each one contains the same organisms. I would like to verify the congruence of these trees in a pairwise fashion (ideally I would have a congruence value matrix). Of course, since I have about 100 trees, I do not want to do it by naked eye. It would be great that some I could evaluate it by analyzing a "congruence metric". For example, I would like to know if a gyrB gene tree is more congruent to 16S rRNA gene tree than recA gene tree. Do you know such metrics? Which software do you recommend?

congruence software phylogeny • 197 views
ADD COMMENTlink modified 7 days ago by Joseph Hughes1.9k • written 16 days ago by fhsantanna220
gravatar for jrj.healey
16 days ago by
United Kingdom
jrj.healey260 wrote:

Here's a partial solution you might be able to run with: I did something like this recently, though I did do it 'by eye'. I clustered my trees by eye (though there are some tools like TOPD that will do it, but I dont know how good they are). I got a couple of other unbiased people to corroborate my cluster estimates.

I created a score matrix like so (this is shortened):

Gene    Tree1   Tree2   Tree3   Tree4   Tree5   Tree6   Tree7   Tree8   Tree9
PAU_pnf 1   1   1   1   1   1   1   1   1
PAK_pnf 1   1   1   1   1   1   1   1   1
PAU_cif 2   2   2   2   2   2   2   2   2
PAK_cif 2   2   2   2   2   2   2   2   2
PLT_cif 2   7   8   2   2   6   2   2   2
PAU_lopT    3   3   9   3   3   1   3   3   3
PAK_lopT    3   3   10  3   3   8   3   3   3
PLT_lopT    3   3   2   3   3   5   3   3   3
PAU_U4  4   4   4   4   4   4   4   4   4
PLT_U4  4   4   4   4   5   4   5   4   4
PAK_U2  4   6   4   4   4   4   6   4   4

I.e. cluster number 1 is arbirarily applied to the node that joined PAU_pnf and PAK_pnf in my dataset. This node persists across all my gene trees here.

Then take that matrix and use the Adjusted Wallace Test described here: there are MATLAB codes for this, though I simply ran it through their webserver here

Its a reasonably robust metric for comparing Sequence Types and can be repurposed for congruency :)

That spits you back out a matrix of congruency across each set of clusters/trees:

Which I then replotted with ggplotly in RStudio:


ADD COMMENTlink modified 16 days ago • written 16 days ago by jrj.healey260

Interesting idea! The only problem for me is doing the score matrix "by eye"... I will take a look in TOPD (nice suggestion).

ADD REPLYlink written 16 days ago by fhsantanna220

I have found this tool:

ADD REPLYlink written 16 days ago by fhsantanna220

Interesting, I'd not found that. That's useful. I had originally intended to use Prunier to infer lateral gene transfers from the phylip alignments in conjunction to the gene trees (I had already used ASTRAL to create the species tree to which I was comparing everything). I just could not get it to work in the end for some reason, and never got a reply from the developers :(

I'd add to this that I'd be interested in hearing about any other offering people have for decent (and ideally easy to use) congruency analysers.

ADD REPLYlink modified 16 days ago • written 16 days ago by jrj.healey260

Another option:

ADD REPLYlink written 15 days ago by fhsantanna220
gravatar for jhc
11 days ago by
jhc2.4k wrote:

perhaps normalized Robinson-Foulds distances help here. Take a look at the ete-compare too. It would allow you to compute all those distances very easily from the command line.

ADD COMMENTlink written 11 days ago by jhc2.4k
gravatar for apa@stowers
12 days ago by
Kansas City
apa@stowers320 wrote:

The cophenetic correlation coefficient can be used for that purpose. For example, in R, get your trees into objects of class "dendrogram" -- if your trees are in Newick format then the "ape" package should be able to read them -- then, "cor( cophenetic(tree1), cophenetic(tree2) )" which you can use to populate a pairwise matrix. In Matlab, the function would be "cophenet".

Basically, you regenerate the sample distance matrices from the branch lengths, linearize, and correlate. Any pair of trees which encodes the same dendrogram distances between genes will have an R value of 1. From simulations, R <= 0.75 generally indicates unrelated trees.

ADD COMMENTlink written 12 days ago by apa@stowers320
gravatar for Joseph Hughes
7 days ago by
Joseph Hughes1.9k
Scotland, UK
Joseph Hughes1.9k wrote:

You could use the Robinson-Foulds measure in Mesquite.

ADD COMMENTlink written 7 days ago by Joseph Hughes1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1220 users visited in the last hour