Question

Need help with codeml

0

Entering edit mode

4.5 years ago

rprog008 ▴ 70

Dear member,

I am performing analysis with various models of codeml to detect pattern on evolution on a disease gene set. I am performing the same analysis on 6 and 16 species of mammal, separately. Though in both the cases I am getting my gene sets are evolving under purifying selection. But while performing chi square test, p-value of 6 species (p-value=0.007) is getting significant while that of 16 species in unsignificant (p-value=0.8). Now, I am bit worried, if it is okay to report result of 6 species and i discard 16 species data. or shall i report about 16 species by statiing that these genes are evolving under weak purifying selection

Thanks in advance

Codeml purifying selection • 1.6k views

ADD COMMENT • link updated 4.5 years ago by Brice Sarver ★ 3.8k • written 4.5 years ago by rprog008 ▴ 70

1

Entering edit mode

If the p-value is non-significant, it means you can not reject the null hypothesis - for codeml, I believe the null hypothesis is "the sequences are evolving neutrally". So you can't reject the sequences are evolving neutrally, and you can't say these genes are "evolving under weak purifying selection".

Some interesting reads:

Still Not Significant

Misuse of ‘trend’ to describe ‘almost significant’ differences in anaesthesia research

ADD REPLY • link 4.5 years ago by h.mon 35k

0

Entering edit mode

Thanks a lot for clearing my doubt and for sharing interesting articles. :)

ADD REPLY • link 4.5 years ago by rprog008 ▴ 70

score 3 · Answer 1 · 2019-10-22

What p-value?

If you're doing the likelihood ratio test to distinguish between models with and without classes where sites can be assigned to groups with ω > 1 (e.g., following a chi-square distribution with the degrees of freedom equal to the difference in the number of free parameters between the models for most, or a mixture that results in a chi-square with one degree of freedom for the M8 vs. M8a comparison), then the p-value is simply providing evidence for selecting one model over the other.

If your best-fit model is one that supports some signature of positive selection (e.g., M8 as opposed to M7), you can then take a look at the results to see which codons have evidence for positive selection under as determined by the Naive Empirical Bayes and/or Bayes Empirical Bayes approaches.

Be aware that the one-rate models may not be the right tests for what you're actually trying to do in some cases. From the manual:

We suggest that The M0-M3 comparison should be used as a test of variable ω among sites rather than a test of positive selection. However, the model of a single ω for all sites is probably wrong in every functional protein, so there is little point of testing.

If you're explicitly looking for other flavors of selection, perhaps across loci, you may find it helpful to explore other approaches. I recommend the Datamonkey Adaptive Evolution Server.