Some of you may know aout the Neighbor-joining (NJ) algorithm, which, given a set of taxa and a distance matrix, essentially does the following :
- find the best two taxa A,B to join under a common parent P
- update the distances from all taxa to P
- repeat until a binary tree is obtained
This works if we're given all the taxa at first. Each time we form a taxa cluster (i.e. a subtree), the other distances to this subtree are updated and everything goes well.
Now, I'm given a set of fixed taxa clusters and I'm asked to join them in an order as what NJ would do.
For each cluster, I have access to the set of taxa they contain (i.e. the leaves), but not to their topology.
Thus, I can't really compute the distance between two clusters like NJ, because NJ updates the distances at each joining, iteratively. Since I'm given the clusters without the topology, I don't know in which order the joins occurred.
So my question is : given a set of taxa clusters T for which I only know of the underlying taxa, and given a taxa distance-matrix, is there some way I can know of the best two clusters of T to join according to NJ ?