Hi!
I'm using ProtTest for finding the best evolutionary model that btter fits my protein alignments, and then reconstruct phylogenies with PhyML.
Is it ok if i don't use the same number of discrete gamma categories in both ProtTest and PhyML (for the same alignment)? I've been using 10 categories in ProtTest and 20 in phyML. I use only 10 in protTest because the running time takes about 4.5 hours.
Do you know any bibliography that deals with this? Because i can't find any recomendation in both program manuals, they only say that as much categories, the better the approximations, but greater the time it takes. I mean, how can I know if the number of categories that I'm using are ok and if i'm doing well in using different number of categories for the same alignment in the two programs?
Thanks!
I think frequently you get better trees by improving alignment (e.g. protein-guided codon alignment) or trying different tree building algorithms. Changing the number of gamma categories frequently has minor or no effect. I know my friends usually use something like 4-8. One way to measure the parameter by yourself is to check if the topology is changed under different configurations.
In theory, you can use LRT to compare nested models and choose the best model(s). This is well implemented for DNA models, but I don't know whether there is counterpart for protein evolution models.
Thanks for your suggestions!
Yes, I used Expresso3DCoffee to make my alignments guided by a structure, and then I manually edited the alignments.
ProtTest doesn't implement LRT because protein evolution models are not nested, it uses Akaike and Bayesian criteria to find the best model, but I need to stablish a number of Gamma categories prior to lunch ProtTest (as an imput parameter), and that's the problem.
I will try with another tree building algorithms and with different number of gamma categories.