I have a set of about 6,000 groups of orthologous proteins. Each orthologous group has a representative protein from anywhere between 7 and 16 species (1 per species). I'm trying to do ancestral state reconstructions for each of these groups based on their well-established phylogenies, and then to compare each amino acid position in a species of interest to the reconstructed sequences at key ancestral nodes.
I've been trying to use PAML (codeml) for this purpose, following the method used in this blog post http://evosite3d.blogspot.com.au/2014/09/tutorial-on-ancestral-sequence.html
My problem is that when using clock = 0 (no molecular clock), PAML requires an unrooted tree. According to PAML's manual " a rooted tree has a bifurcation at the root, while an unrooted tree has a trifurcation or multifurcation at the root."
This is problem is that the mammalian tree that I have (composed of eutherians and marsupials) is a well-established bifurcating tree. How can I do an ancestral state reconstruction with PAML which requires a multifurcating tree when the true phylogeny is bifurcating? Adding an outgroup like platypus or chicken would still be a rooted tree since they're both outgroups and a polytomy of chicken/platypus, eutherians and marsupials would be false. I'm sure I've just deeply misunderstood something along the way. Any help would be really greatly appreciated!
The short answer is to unroot the tree. This can be accomplished easily in R using
It is possible for a tree that is normally bifurcating when rooted to have a node that has an order greater than two when unrooted. Consider a three-taxa case. There is one possible unrooted tree (shaped like a 'Y') and three possible rooted trees (whether rooted using the first, second, or third taxon) that are bifurcating. Remember that the placement of the root is a hypothesis.