Question

Calculating LRT of a phylogenetic tree (as in codeml)

0

Entering edit mode

8.9 years ago

spiral01 ▴ 110

I am using codeml to calculate dN/dS ratios on a data set including over 1000 genes. I was hoping to speed up the process by finding a way to calculate the LRT prior to running codeml, thereby avoiding having to run it on both the null tree and the alternative tree. Is anyone familiar with a way of doing this considering the trees will be labelled in the appropriate way for codeml (e.g. #1).

codeml paml lrt • 2.6k views

ADD COMMENT • link updated 8.9 years ago by Brice Sarver ★ 3.8k • written 8.9 years ago by spiral01 ▴ 110

score 2 · Accepted Answer · 2016-08-01

I think you are confusing two things. You also need more info on what models you are fitting (branch, site, or branch-site), but I'll answer generally.

LRTs are used to select among nested models. In codeml, this refers to models with and without a site class that corresponds to an omega > 1; evidence in favor of a model with such a site class allows for the secondary inference of codons under positive selection. This is how you select between M7 and M8 and determine the best-fit model or look for selection along particular branches, for example. This can NOT be used for non-nested models.

In molecular phylogenetics, the LRT is often used to select between clocklike and non-clocklike trees (e.g., testing the molecular clock hypothesis) and selecting among nested pairwise models of nucleotide sequence evolution (say, submodels of GTR). Note that in the second case you will probably be better off testing among sets of models using the AIC, BIC, or a decision-theoretic criterion.

What this means is that in order to test your hypothesis, you'll need the likelihood of that particular model given your tree and data in order to take the likelihood ratio and assess fit. You can't get this likelihood without fitting the model, i.e., running codeml in the first place.

Does this answer your question?