Question: Best Statistic To Compare Two Fst Distributions?
0
6.7 years ago by
confusedious420
Australia
confusedious420 wrote:

I am encountering difficulty selecting a statistic to compare two distributions of Weir & Cockerham's Fst. While I know the distribution is non-normal for the two Fst distributions I am looking at, I cannot say they are independent. One is a comparison between populations A and B, and the other is between populations A and C - as you can see both involve data from population A.

I was wondering whether a Wilcoxon signed-rank test or Kolmogorov-Smirnov test would be appropriate for testing whether the two distributions significantly differ?

If not, could anyone suggest a better statistic? The samples are too large (millions of markers) for me to do this via bootstrapping.

fst statistics • 3.7k views
written 6.7 years ago by confusedious420
2

Most of the tests out there will reject the null (same distribution), in my experience. Can you tell us more about the biological question you are trying to get at? Maybe there is another way to think about it ... perhaps you don't need to test if the distributions are different, but test for a correlation?

In brief, I have produced three pairwise Fst distributions for ~1 million SNPs between three human population samples. I would like to know whether the distributions are different from each other in a statistical sense, particularly between A vs. B and A vs. C. These two distributions look slightly different when eyeballing plots, but I would like to be able to quantify the difference. Would a test for correlation be more appropriate? Would there be one you would suggest?

1

Honestly, if you have so many data points, it would be weird if you did not see a difference. The interesting question is, I believe: How big is the difference? It might be worth looking into Bayesian statistics for this question. And example of comparing two distributions can be found at [http://www.indiana.edu/~kruschke/BEST/]. Yes, this is computationally more demanding than a frequential statistic, but is has much better power and the interpretability is more straight forward (in my opinion.)