Automated methods for comparing trees
Entering edit mode
8 days ago

I have a couple trees derived from separate agglomerative hierarchical clustering runs on two datasets, which yields trees A and B (see figure).

I pick a node from tree A, at some depth. I take the signal associated with the leaves of that node and aggregate it (take the mean at each position, say). This gives me a vector of signal for that node.

I repeat for tree B, getting another signal vector specific to leaves from the node off of tree B.

I run some distance function over those two signal vectors at get a score.

My question is: Are there algorithms for doing this in an automated way, which optimize for the distance score?

In my sketch below, for instance, I have nodes from trees A and B that are very different in constitution. Tree B's node has many more members than tree A's node. But their aggregate signal could be very similar based on Euclidean or other distance metric.

I want some rigorous way of identifying those "best-matching" nodes, regardless of differences between their leaf content.

The combinatorics of node selection might make testing prohibitive. So I thought going to the same normalized tree depth would be helpful, as a start.

Before I try reinventing the wheel, are there approaches for doing this which are "rigorous", "efficient", or are there other aspects I am overlooking? Thanks!

Tree sketch

distance agglomerative compare tree • 177 views
Entering edit mode
8 days ago
Mensur Dlakic ★ 27k

Ete3 has a tree compare function:

This may also be of interest:

PhyloDM has some functions for branch distance calculations:

If you calculate all-vs-all distances within a tree, reducing those matrices to 2D might give you an easy way to highlight the differences.


Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6