Need help with codeml
1
0
Entering edit mode
4.5 years ago
rprog008 ▴ 70

Dear member,

I am performing analysis with various models of codeml to detect pattern on evolution on a disease gene set. I am performing the same analysis on 6 and 16 species of mammal, separately. Though in both the cases I am getting my gene sets are evolving under purifying selection. But while performing chi square test, p-value of 6 species (p-value=0.007) is getting significant while that of 16 species in unsignificant (p-value=0.8). Now, I am bit worried, if it is okay to report result of 6 species and i discard 16 species data. or shall i report about 16 species by statiing that these genes are evolving under weak purifying selection

Thanks in advance

Codeml purifying selection • 1.6k views
ADD COMMENT
1
Entering edit mode

If the p-value is non-significant, it means you can not reject the null hypothesis - for codeml, I believe the null hypothesis is "the sequences are evolving neutrally". So you can't reject the sequences are evolving neutrally, and you can't say these genes are "evolving under weak purifying selection".

Some interesting reads:

Still Not Significant

Misuse of ‘trend’ to describe ‘almost significant’ differences in anaesthesia research

ADD REPLY
0
Entering edit mode

Thanks a lot for clearing my doubt and for sharing interesting articles. :)

ADD REPLY
3
Entering edit mode
4.5 years ago
Brice Sarver ★ 3.8k

What p-value?

If you're doing the likelihood ratio test to distinguish between models with and without classes where sites can be assigned to groups with ω > 1 (e.g., following a chi-square distribution with the degrees of freedom equal to the difference in the number of free parameters between the models for most, or a mixture that results in a chi-square with one degree of freedom for the M8 vs. M8a comparison), then the p-value is simply providing evidence for selecting one model over the other.

If your best-fit model is one that supports some signature of positive selection (e.g., M8 as opposed to M7), you can then take a look at the results to see which codons have evidence for positive selection under as determined by the Naive Empirical Bayes and/or Bayes Empirical Bayes approaches.

Be aware that the one-rate models may not be the right tests for what you're actually trying to do in some cases. From the manual:

We suggest that The M0-M3 comparison should be used as a test of variable ω among sites rather than a test of positive selection. However, the model of a single ω for all sites is probably wrong in every functional protein, so there is little point of testing.

If you're explicitly looking for other flavors of selection, perhaps across loci, you may find it helpful to explore other approaches. I recommend the Datamonkey Adaptive Evolution Server.

ADD COMMENT
0
Entering edit mode

Thanks Brice for the answer. I am facing one more problem. What if p-value of LRT between M1a vs M2a is unsignificant while p-value of LRT between M0-M3, M7 vs M8 and M8 vs M8a is significant. I am sorry if this seems naive question. Please suggests me

ADD REPLY
1
Entering edit mode

Those sets of nested models are looking at slightly different things. The M7 vs. M8 and M8 vs. M8a comparisons are the best for identifying signatures of positive selection, with the M8 vs. M8a comparison (a ω = 1 class in M8a) yielding fewer false positives.

Remember that you're selecting among models here; one model's likelihood or a comparison's p-value isn't saying which one is better for what you're trying to do. I'd suggest reading the manual or supporting papers and determining which are most suited for your purposes.

ADD REPLY

Login before adding your answer.

Traffic: 2962 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6