Automated methods for comparing trees
1
0
Entering edit mode
8 days ago

I have a couple trees derived from separate agglomerative hierarchical clustering runs on two datasets, which yields trees A and B (see figure).

I pick a node from tree A, at some depth. I take the signal associated with the leaves of that node and aggregate it (take the mean at each position, say). This gives me a vector of signal for that node.

I repeat for tree B, getting another signal vector specific to leaves from the node off of tree B.

I run some distance function over those two signal vectors at get a score.

My question is: Are there algorithms for doing this in an automated way, which optimize for the distance score?

In my sketch below, for instance, I have nodes from trees A and B that are very different in constitution. Tree B's node has many more members than tree A's node. But their aggregate signal could be very similar based on Euclidean or other distance metric.

I want some rigorous way of identifying those "best-matching" nodes, regardless of differences between their leaf content.

The combinatorics of node selection might make testing prohibitive. So I thought going to the same normalized tree depth would be helpful, as a start.

Before I try reinventing the wheel, are there approaches for doing this which are "rigorous", "efficient", or are there other aspects I am overlooking? Thanks!

distance agglomerative compare tree • 177 views
1
Entering edit mode
8 days ago
Mensur Dlakic ★ 27k

Ete3 has a tree compare function:

http://etetoolkit.org/

This may also be of interest:

https://github.com/rrnewton/PhyBin

PhyloDM has some functions for branch distance calculations:

https://github.com/aaronmussig/PhyloDM

If you calculate all-vs-all distances within a tree, reducing those matrices to 2D might give you an easy way to highlight the differences.