Question

Tajimas D And Fu And Li'S Neutrality Tests - P Values?!

5

Entering edit mode

12.1 years ago

User 7433 ▴ 170

Hi there,

I have used Tajima's D and Fu and Li's statistics to assess whether my sequencing data shows evidence of deviation from neutrality. (Using DNAsp software).

The problem I have is that these tests do not give ACTUAL P values. So for Tajima's D, you get a D value and then P<0.05 for example...

My supervisor wants to know why I cannot show the actual P value for these tests and I need to explain why! can someone help me?

This is one explanation....

''A negative Tajima's D signifies an excess of low frequency polymorphisms relative to expectation, indicating population size expansion (e.g., after a bottleneck or a selective sweep) and/or purifying selection. A positive Tajima's D signifies low levels of both low and high frequency polymorphisms, indicating a decrease in population size and/or balancing selection. However, calculating a conventional "p-value" associated with any Tajima's D value that is obtained from a sample is impossible. Briefly, this is because there is no way to describe the distribution of the statistic that is independent of the true, and unknown, theta parameter (no pivot quantity exists)''

I just want to be able to explain WHY i cannot give an actual P value..

Can someone please explain?

Thank you!

statistics p-value • 19k views

ADD COMMENT • link updated 12.1 years ago by Casey Bergman 18k • written 12.1 years ago by User 7433 ▴ 170

score 3 · Answer 1 · 2012-03-30

Tajima's D and Fu & Li's D are both summary statistics -- that is the summarize sequence data into a single value. Like all summary statistics, the probability of the statistic must be evaluated under a null hypothesis to generate the P-value. This can be done theoretically if there is an analytical expression to calcualte the P-value or empirically if the data can be simulated under the null hypothesis. In DNAsp, the probabilty of Tajima's D and Fu & Li's D are estimated by simulation, not by an analytical expression. Thus P-values are expressed in terms of the number of times X that a simulated data set generates a D value equal to or more extreme than the one you observe. The final p-value is X over the total number of simulations Y. But since these simulated P-values are approximate, they are reported as being bounded by the upper P-value value of traditional cutoffs (e.g. 0.05, 0.01).