Question: Differences In The Phylogenetic Trees From Protein And Codon Msa
6.4 years ago by
I made MSA of homologous protein sequences by muscle. Next after alignment trimming (removing MSA columns >50% gaps), I built phylogenetic tree by raxml with 1000 bootstraps and LG+I+G substitution model. I converted the MSA to codon alignment by pal2nal, and built the tree with GTR substitution model.

The resulting trees differ quite a bit. I am wondering which one is more reliable if it is proper to compare the trees. Let me know if there is any paper which compares these two scenarios.

6.4 years ago by
Have you actually done any statistical tests to see how they differ? There is no one answer to this problem as codon and amino acid models are quite different from one another and we don't necessarily expect the resulting trees to be the same, although we hope they are. But it isn't enough to just say "the trees differ quite a bit", you really want a quantitative description of how they differ. Setting up and doing something like an AIC test will tell you how much two tree topologies differ statistically (you'll want to read quite a bit about AIC and how to do a test properly). You can also calculate things like the Robinson-Folds distance which is a measurement of "distance" between tree topologies.

What really matters when looking at the trees is how they differ in terms of major groups of sequences, what sequences have moved where, etc. Without seeing trees or descriptive stats it is hard to tell why the two trees differ and why.

