Question: Metrics to evaluate tree congruence
2
gravatar for fhsantanna
4.0 years ago by
fhsantanna490
Brazil
fhsantanna490 wrote:

I have multiple phylogenetic trees of different marker genes. Each one contains the same organisms. I would like to verify the congruence of these trees in a pairwise fashion (ideally I would have a congruence value matrix). Of course, since I have about 100 trees, I do not want to do it by naked eye. It would be great that some I could evaluate it by analyzing a "congruence metric". For example, I would like to know if a gyrB gene tree is more congruent to 16S rRNA gene tree than recA gene tree. Do you know such metrics? Which software do you recommend?

congruence software phylogeny • 1.5k views
ADD COMMENTlink modified 4.0 years ago by Joseph Hughes2.9k • written 4.0 years ago by fhsantanna490
2
gravatar for Joe
4.0 years ago by
Joe18k
United Kingdom
Joe18k wrote:

Here's a partial solution you might be able to run with: I did something like this recently, though I did do it 'by eye'. I clustered my trees by eye (though there are some tools like TOPD that will do it, but I dont know how good they are). I got a couple of other unbiased people to corroborate my cluster estimates.

I created a score matrix like so (this is shortened):

Gene    Tree1   Tree2   Tree3   Tree4   Tree5   Tree6   Tree7   Tree8   Tree9
PAU_pnf 1   1   1   1   1   1   1   1   1
PAK_pnf 1   1   1   1   1   1   1   1   1
PAU_cif 2   2   2   2   2   2   2   2   2
PAK_cif 2   2   2   2   2   2   2   2   2
PLT_cif 2   7   8   2   2   6   2   2   2
PAU_lopT    3   3   9   3   3   1   3   3   3
PAK_lopT    3   3   10  3   3   8   3   3   3
PLT_lopT    3   3   2   3   3   5   3   3   3
PAU_U4  4   4   4   4   4   4   4   4   4
PLT_U4  4   4   4   4   5   4   5   4   4
PAK_U2  4   6   4   4   4   4   6   4   4

I.e. cluster number 1 is arbirarily applied to the node that joined PAU_pnf and PAK_pnf in my dataset. This node persists across all my gene trees here.

Then take that matrix and use the Adjusted Wallace Test described here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3209087/ there are MATLAB codes for this, though I simply ran it through their webserver here http://www.comparingpartitions.info/index.php?link=Tool

Its a reasonably robust metric for comparing Sequence Types and can be repurposed for congruency :)

That spits you back out a matrix of congruency across each set of clusters/trees: https://s30.postimg.org/jpp2y1969/Screen_Shot_2016_05_14_at_13_42_18.png

Which I then replotted with ggplotly in RStudio:

Voila: https://s30.postimg.org/uo0cg7xrl/heatmap2k_transp1.png

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Joe18k

Interesting idea! The only problem for me is doing the score matrix "by eye"... I will take a look in TOPD (nice suggestion).

ADD REPLYlink written 4.0 years ago by fhsantanna490

I have found this tool: http://www.mas.ncl.ac.uk/~ntmwn/compare2trees/index.html

ADD REPLYlink written 4.0 years ago by fhsantanna490

Interesting, I'd not found that. That's useful. I had originally intended to use Prunier to infer lateral gene transfers from the phylip alignments in conjunction to the gene trees (I had already used ASTRAL to create the species tree to which I was comparing everything). I just could not get it to work in the end for some reason, and never got a reply from the developers :(

I'd add to this that I'd be interested in hearing about any other offering people have for decent (and ideally easy to use) congruency analysers.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Joe18k

Another option: http://phylo.io/

ADD REPLYlink written 4.0 years ago by fhsantanna490
2
gravatar for jhc
4.0 years ago by
jhc2.9k
Spain
jhc2.9k wrote:

perhaps normalized Robinson-Foulds distances help here. Take a look at the ete-compare too. It would allow you to compute all those distances very easily from the command line.

ADD COMMENTlink written 4.0 years ago by jhc2.9k
0
gravatar for apa@stowers
4.0 years ago by
apa@stowers500
Kansas City
apa@stowers500 wrote:

The cophenetic correlation coefficient can be used for that purpose. For example, in R, get your trees into objects of class "dendrogram" -- if your trees are in Newick format then the "ape" package should be able to read them -- then, "cor( cophenetic(tree1), cophenetic(tree2) )" which you can use to populate a pairwise matrix. In Matlab, the function would be "cophenet".

Basically, you regenerate the sample distance matrices from the branch lengths, linearize, and correlate. Any pair of trees which encodes the same dendrogram distances between genes will have an R value of 1. From simulations, R <= 0.75 generally indicates unrelated trees.

ADD COMMENTlink written 4.0 years ago by apa@stowers500
0
gravatar for Joseph Hughes
4.0 years ago by
Joseph Hughes2.9k
Scotland, UK
Joseph Hughes2.9k wrote:

You could use the Robinson-Foulds measure in Mesquite.

ADD COMMENTlink written 4.0 years ago by Joseph Hughes2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2495 users visited in the last hour
_