I'm trying to find a way to evaluate a phylogenetic tree for its ability to 'separate' a set of sequences into different branches of an unrooted tree. Each of my sequences has one of two phenotypes which was determined by a functional assay. In an ideal case they would be completely separated in an unrooted tree but in reality a branch that is primarily composed of phenotype-A will be contaminated with sequences with phenotype-B.
My current method for dealing with this is to calculate the distance between each sequence and a reference sequence and then looking for a difference in the average distance between the reference and phenotype-A sequences and phenotype-B sequences. However, this relies on the assumption that the evolutionary distance between phenotype-A and the reference are different between phenotype-B and the reference.
I'm hoping to find something that doesn't rely on a comparison with a reference but solely with the proportion of sequences from each phenotype in each branch of the tree.
Any ideas or references that someone can point me towards?