I have a phylogenetic problem that I’m unsure how to address. I have two groups that I’d like to compare trait data for, with one group being closely related, and one distantly related, and I need to somehow account for the differences in total branch length across the two groups.
Group A represents 50 unique species in single genus, while groups B represents 50 unique species across 10 distantly related families. I want to know if a particular gene family is more or less diverse in group A than in group B (more or less diverse in the genus of interest than would be expected across the rest of the tree) for species-specific genes in this gene family (meaning that they occur in a single species and nowhere else across the tree).
I have a good phylogeny and count data for the species-specific genes in this gene family for each species in both groups. Without accounting for phylogeny, I find a statistically lower number of species-specific genes in group A (the closely related group), than in group B. However, this is to be expected considering the close relationship between the species in group A, so I don’t know if the result is actually meaningful.
I’m familiar with phylogenetic anova’s such as phytools’ phylANOVA, but I’m not sure this is actually the correct metric to use here- I know there’s going to be phylogenetic autocorrelation in group A and I don’t want to normalize over the difference between groups A and B - I want to normalize the distance within the groups: To account for the total patristic distance of group B being larger than the total patristic distance of group A. I'm looking for something kind of like CAFÉ does, but for a single group of predefined genes. Any ideas on how to approach this- or feedback if I’m thinking about this incorrectly- would be greatly appreciated.