Question: Metrics to evaluate tree congruence
2
gravatar for fhsantanna
10 months ago by
fhsantanna340
Brazil
fhsantanna340 wrote:

I have multiple phylogenetic trees of different marker genes. Each one contains the same organisms. I would like to verify the congruence of these trees in a pairwise fashion (ideally I would have a congruence value matrix). Of course, since I have about 100 trees, I do not want to do it by naked eye. It would be great that some I could evaluate it by analyzing a "congruence metric". For example, I would like to know if a gyrB gene tree is more congruent to 16S rRNA gene tree than recA gene tree. Do you know such metrics? Which software do you recommend?

congruence software phylogeny • 450 views
ADD COMMENTlink modified 10 months ago by Joseph Hughes2.3k • written 10 months ago by fhsantanna340
2
gravatar for jrj.healey
10 months ago by
jrj.healey2.9k
United Kingdom
jrj.healey2.9k wrote:

Here's a partial solution you might be able to run with: I did something like this recently, though I did do it 'by eye'. I clustered my trees by eye (though there are some tools like TOPD that will do it, but I dont know how good they are). I got a couple of other unbiased people to corroborate my cluster estimates.

I created a score matrix like so (this is shortened):

Gene    Tree1   Tree2   Tree3   Tree4   Tree5   Tree6   Tree7   Tree8   Tree9
PAU_pnf 1   1   1   1   1   1   1   1   1
PAK_pnf 1   1   1   1   1   1   1   1   1
PAU_cif 2   2   2   2   2   2   2   2   2
PAK_cif 2   2   2   2   2   2   2   2   2
PLT_cif 2   7   8   2   2   6   2   2   2
PAU_lopT    3   3   9   3   3   1   3   3   3
PAK_lopT    3   3   10  3   3   8   3   3   3
PLT_lopT    3   3   2   3   3   5   3   3   3
PAU_U4  4   4   4   4   4   4   4   4   4
PLT_U4  4   4   4   4   5   4   5   4   4
PAK_U2  4   6   4   4   4   4   6   4   4

I.e. cluster number 1 is arbirarily applied to the node that joined PAU_pnf and PAK_pnf in my dataset. This node persists across all my gene trees here.

Then take that matrix and use the Adjusted Wallace Test described here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3209087/ there are MATLAB codes for this, though I simply ran it through their webserver here http://www.comparingpartitions.info/index.php?link=Tool

Its a reasonably robust metric for comparing Sequence Types and can be repurposed for congruency :)

That spits you back out a matrix of congruency across each set of clusters/trees: https://s30.postimg.org/jpp2y1969/Screen_Shot_2016_05_14_at_13_42_18.png

Which I then replotted with ggplotly in RStudio:

Voila: https://s30.postimg.org/uo0cg7xrl/heatmap2k_transp1.png

ADD COMMENTlink modified 10 months ago • written 10 months ago by jrj.healey2.9k

Interesting idea! The only problem for me is doing the score matrix "by eye"... I will take a look in TOPD (nice suggestion).

ADD REPLYlink written 10 months ago by fhsantanna340

I have found this tool: http://www.mas.ncl.ac.uk/~ntmwn/compare2trees/index.html

ADD REPLYlink written 10 months ago by fhsantanna340

Interesting, I'd not found that. That's useful. I had originally intended to use Prunier to infer lateral gene transfers from the phylip alignments in conjunction to the gene trees (I had already used ASTRAL to create the species tree to which I was comparing everything). I just could not get it to work in the end for some reason, and never got a reply from the developers :(

I'd add to this that I'd be interested in hearing about any other offering people have for decent (and ideally easy to use) congruency analysers.

ADD REPLYlink modified 10 months ago • written 10 months ago by jrj.healey2.9k

Another option: http://phylo.io/

ADD REPLYlink written 10 months ago by fhsantanna340
2
gravatar for jhc
10 months ago by
jhc2.5k
Germany
jhc2.5k wrote:

perhaps normalized Robinson-Foulds distances help here. Take a look at the ete-compare too. It would allow you to compute all those distances very easily from the command line.

ADD COMMENTlink written 10 months ago by jhc2.5k
0
gravatar for apa@stowers
10 months ago by
apa@stowers320
Kansas City
apa@stowers320 wrote:

The cophenetic correlation coefficient can be used for that purpose. For example, in R, get your trees into objects of class "dendrogram" -- if your trees are in Newick format then the "ape" package should be able to read them -- then, "cor( cophenetic(tree1), cophenetic(tree2) )" which you can use to populate a pairwise matrix. In Matlab, the function would be "cophenet".

Basically, you regenerate the sample distance matrices from the branch lengths, linearize, and correlate. Any pair of trees which encodes the same dendrogram distances between genes will have an R value of 1. From simulations, R <= 0.75 generally indicates unrelated trees.

ADD COMMENTlink written 10 months ago by apa@stowers320
0
gravatar for Joseph Hughes
10 months ago by
Joseph Hughes2.3k
Scotland, UK
Joseph Hughes2.3k wrote:

You could use the Robinson-Foulds measure in Mesquite.

ADD COMMENTlink written 10 months ago by Joseph Hughes2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 765 users visited in the last hour