Question: Multiple hypothesis testing, Fst outliers
gravatar for chrisbala10
2.0 years ago by
chrisbala1010 wrote:

Hi All,

We are trying to identify fst outliers from a comparison of two populations. We've settled on using pFst from the GPAT++ package.

To my reading, in such studies folks sometimes use bonferonni or benjami-hochberg-corrected p values. In other cases, I've seen arbitrary cutoffs like 10-8. I'm wondering if anyone has suggestions about best practices?

I also notice that GPAT has a permutation test to derive empirical p values, and that seems desirable, but it is not totally clear from the documentation what this is doing. It would seem like in pFst itself, we could shuffle population assignment to get a null distribution of pFst. but I wonder if this is redundant with what pFst is already doing...

Thanks for any feedback/suggestions you might have!


ADD COMMENTlink modified 24 months ago • written 2.0 years ago by chrisbala1010
gravatar for Zev.Kronenberg
2.0 years ago by
United States
Zev.Kronenberg11k wrote:


I'm happy to hear people are using GPAT code within VCFLIB. While unpublished, there is a working manuscript in the VCF github. I've tried to describe, in better detail, what pFst and other GPAT tools do. pFst is terribly named (i don't take responsibility), because it is a likelihood ratio test that measures the difference in allele frequency, weighting allele counts with the genotype likelihood. wcFST is Weir and Cockerham's (1984) FST estimator. I've provided two different types of permutations you can do with wcFST, permuting single data points and smoothed windows. One of these workflows I describe in the working manuscript.

pFST is a statistical test and provides a proper p-value. Here are the assumption of the test if you're using it for trait mapping.

  1. There is no genotypic stratification between your genotypic groups.
  2. Neighboring variants are independent (almost all test out there violate this).

Happy to clarify any points or answer other questions.

ADD COMMENTlink written 2.0 years ago by Zev.Kronenberg11k
gravatar for chrisbala10
24 months ago by
chrisbala1010 wrote:

Thanks Zev,

That is helpful, and we'll play around with the different workflows.

Just to clarify further/get your opinion: What would you suggest for identifying "outliers"? It seems like in your Science paper (pigeons), you estimated Fst, and just highlighted the top 1% (seems reasonable). But with pFSt, how would you suggest dealing with multiple testing? For the VAAST analysis, you used some genome-wide significance threshold. (VAAST seems interesting... might need to try that too!). Anyway, just curious if you or anyone has thoughts on best practices.

(added note: based on the VAAST docs, seems like the permutation test might help here b/c with many permutation can get the p value into the realm of Bonferroni-corrected thresholds)? (or am I missing the point here?) Chris

ADD COMMENTlink modified 24 months ago • written 24 months ago by chrisbala1010

It really depends on the study. In the case of the Ephb2 (head crest), the allele was recessive so we were able to reach significance. If i remember correctly, pFst withstood a multiple test correction.

ADD REPLYlink written 23 months ago by Zev.Kronenberg11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1149 users visited in the last hour