**4.7k**wrote:

I know there are a few populations genetics/molecular evolution types in the Biostar community, so hopefully someone has already run into this problem.

I've calculated Tajima's *D* statistic for a couple of DNA alignments using the R package PEGAS, which gives a p-value using a Beta distribution. It happens that the Beta distribution doesn't perfectly model the distribution of *D* under neutral evolution, so many authors simulate data under neutrality to make a null distribution. Unfortunately, all the methods for these papers tend to say about this procedure is "we used `ms`

to simulate the null...".

Does anyone know how to do this properly. My understanding was you simulated data with a matching number of segregating sites, so it you had 100 samples and 20 segregating sites you'd do

```
ms 100 1000 -s 20
```

giving 1000 simulations. However when I calculate *D* from those simulations I get a skewed distribution with a mean near to 1 (I would have though it would b 0) but a p-value pretty close to the one using the Beta distribution. It's entirely possible I messed up somewhere in the calculations, (you can see my function here) but I thought I'd make sure I'm actually simulating the distribution correctly before I pick that apart.

Yeah! Love population genetics questions!

3.3k