How to calculate p-value from codeml likelihood ratio test results for Benjamini-Hochberg correction
1
1
Entering edit mode
5.5 years ago
memory_donk ▴ 330

Hi Biostars

I'm doing positive selection analysis on a relatively large set of orthologous genes using PAML's implementation of the branch-site model. I've got the results of my likelihood ratio test (LRT) for each gene, and according to this (1) and other papers, the branch-site model follows a 1:1 mixture of chi^2 and point mass 0 distribution, so the critical values for 5% and 1% are 2.71 and 5.41 respectively.

My problem is that I need to perform a Benjamini-Hochberg correction, and for that I need the specific p-value for each gene, not just to decide whether it passes either critical value.

How would I go about calculating the actual p-value from each LRT result and the number of degrees of freedom? I can write a script in python/perl (maybe R if necessary) if I know how to actually calculate the p-values. Any help would be greatly appreciated!

(1) Statistical properties of the branch-site test of positive selection http://www.ncbi.nlm.nih.gov/pubmed/21087944

statistics PAML codeml LRT • 4.2k views
ADD COMMENT
4
Entering edit mode
5.5 years ago
memory_donk ▴ 330

I apologise for positing this and then answering my own question, I hope I haven't broken a site rule (if so please feel free to delete this post, though I'll leave it up for now in case anyone else runs into a similar problem).

This is a classic case of RTFM, because I haven't read the whole (huge) PAML manual and missed a small blurb where they describe how to calculate p-value for this distribution.

You calculate p value with this distribution by calculating the p value using chi^2 (1 degree of freedom) then divide by 2. Yup. Divide by 2. So if you had an LRT result of 2.71 you'll get ~.10, then divide by 2 to get 0.05 (5%). This distribution actually winds up being less stringent than a normal chi^2. They actually recommend in the manual that you just use chi^2 (1 degree of freedom) rather than the mixture, however the literature that actually uses the branch-site model seems to disagree (maybe for obvious reasons).

Again I apologise for answering my own question, but I had struggled for quite a while trying to answer this with google. Hopefully someone learns from the mistake.

ADD COMMENT

Login before adding your answer.

Traffic: 1227 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6