Question

Phylogenetic analysis using concatenated protein sequences and different best models of sequence evolution

2

Entering edit mode

9.2 years ago

dago ★ 2.8k

There have been other discussions on this topic in the forum and I found them really interesting.

However, I would like to ask one aspect that it is not yet clear for me.

Let's assume that we want to perform a robust phylogenetic analysis using orthologous genes.

We identify and isolate them. We align them. Then we can use Gblocks to look for region in each alignment that are "noisy" or purely aligned. We remove them. Then we concatenate all alignments.

We use then PartitionFinderProtein to identify the best models of sequence evolution for each partition that it would correspond the the concatenated genes.

We obtain then a distribution of proteins for the best evolution models. So in theory we can now proceed to reconstruct a phylogenetic tree with the option of selecting specific evolution model for specific region in the concatenate sequences. Now, this last step is unclear for me. How we can use (RaxML ro Phyml) in this way? How can the evolution models be specified for each region in the concatenated alignment?

I am missing something here.

phylogeny protein • 5.0k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by dago ★ 2.8k

Ram · Answer 1 · 2015-01-24

For raxml you define partitions models in a different file, which you point raxml to using the option -q. The format is documented under the RAXML Options section of the manual, but us basically define a datatype (DNA, or n AA transition matrix), and then specify a name and indices for partitions taking that model:

JTT, gene1 = 1, 500
WAG, gene2 = 501, 800
WAG, gene3 = 801, 1000

Note, in the above you'd estimate two different WAG models, one for each gene. If you wanted gene2 and gene1 to evolve under the same WAG model you'd make them into one partition.