How to interpret LRT < 0 in codeml branch-model based test?
1
2
Entering edit mode
8.4 years ago
sckinta ▴ 730

Hi,

I am running codeml on orthogroup tree to determine whether a certain lineage is under positive selection. The way I did is to set null model with fixed omega (model=2, NSsites=0, fix_omega=1) while branch-specific model by labeling branch of interest (foreground) in tree file (model=2, NSsites=0, fix_omega=0). After running codeml on both models, I was able to extract likelihood (lnL) and number of parameters (np) for each model, as well as Kn/Ks value for background branches and foreground branches in branch-specific model.

To found out the significant difference on Kn/Ks values between foreground and background branches, I calculate LRT by LRT=2*(lnL1-lnL0) and corresponding pvalues using dchisq function (df =1) in R. I noticed that in some orthogroups, LRT was reported as a negative value which leads to pvalue =0. That is definitely abnormal. How could LRT being negative happen?

Also, I am still a little bit confused about Kn/Ks value predicted for background and foreground in branch-specific model report. If I compared them and Kn/Ks is larger in foreground compared to background, may I make a comment that lineage I chose is under positive selections? The other way around, if Kn/Ks is smaller, the lineage is under negative selection.

Hope I have made my question clear. Thank you

Likelihood-Ratio-Test codeml branch-model • 7.1k views
ADD COMMENT
4
Entering edit mode
8.4 years ago
abascalfederico ★ 1.2k

Not completely sure I remember the details, but this may be of help. You are testing two models: one in which the omega parameter for the "foreground" branch is optimised (free parameter) and one in which omega is set at the value you choose (1, for example). The former is the alternative model and is compared to the latter, the null model, in which you fix omega at 1. The alternative model should always give you a higher or equal likelihood than the null model. The LRT test will tell you whether the difference is significant or not. In the PAML documentation there is information on how to do this test (number of parameters, one extra in this case, etc). See http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf

Regarding the interpretation of the results, no, this test does not necessarily mean whether there is positive selection or not. The LRT will tell you if the optimised omega is different from the omega you fix. It depends on the value of omega. Is it greater than 1?

I would start by freeing the omega for that branch and see if it is greater than one. If that is the case, then you want to test whether that difference compared to 1 is significant or not (calculating the likelihood with omega fixed at 1). If it is different, yes, you have evidence for positive selection. You can also test whether the omega of the foreground branch is significantly different from the general omega (model 2 vs model 0; I would start from this).

There are few cases in which the branch test will find positive selection. Positive selection is hard to detect and happens at particular sites (purifying selection at other sites may obscure the signal of positive selection). Look at the branch-site model, it has greatest power.

I hope this helps,

Federico

ADD COMMENT
0
Entering edit mode

Yes, I agree that 'The alternative model should always give you a higher or equal likelihood than the null model' but in my situation, some alternative model have lower likelihood than null model, which causes LRT < 0. I do not know how it is caused. I had run branch-site model and it has the same problem.

ADD REPLY
0
Entering edit mode

Are you sure you are setting the alternative and null models appropriately? It may help if you send the LnL values for each model. For example, if the alternative model has LnL of -10000 and the null of -10040, then that is perfect, the alternative model is more likely.

ADD REPLY
2
Entering edit mode

Here is the output summary I got from branchsite-based model for 8 different trees. I ran dchisq on LRT in R and get pvalue for each. You can notice that some LRT is negative, all the pvalue were estimated = 0

OrthoID    alt_np    null_np    df    alt_lnL         null_lnL        LRT           pvalue
12947      23        22         1     -2542.819296    -2544.349252    3.06E+000     0.0493860338
12407      27        26         1     -1227.731593    -1227.731574    -3.80E-005    0
11587      31        30         1     -4794.899195    -4796.434562    3.070734      0.0490328968
9264       45        44         1     -3705.245566    -3707.429461    4.36779       0.0214944157
8743       63        62         1     -6511.918167    -6517.280808    1.07E+001     0.000571137
8198       7         6          1     -2117.499108    -2129.285795    23.573374     6.25E-007
4700       75        74         1     -4677.628126    -4677.628125    -2.00E-006    0
7573       7         6          1     -2486.151302    -2536.660151    101.017698    4.60E-024
ADD REPLY
0
Entering edit mode

That's not negative in practice. Here, -3.8E-05 is basically zero (no difference between alternative and null models; p-value=1).

ADD REPLY
0
Entering edit mode

I see, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6