Question

Multiple hypothesis testing, Fst outliers

0

Entering edit mode

7.1 years ago

chrisbala10 ▴ 10

Hi All,

We are trying to identify fst outliers from a comparison of two populations. We've settled on using pFst from the GPAT++ package.

To my reading, in such studies folks sometimes use bonferonni or benjami-hochberg-corrected p values. In other cases, I've seen arbitrary cutoffs like 10-8. I'm wondering if anyone has suggestions about best practices?

I also notice that GPAT has a permutation test to derive empirical p values, and that seems desirable, but it is not totally clear from the documentation what this is doing. It would seem like in pFst itself, we could shuffle population assignment to get a null distribution of pFst. but I wonder if this is redundant with what pFst is already doing...

Thanks for any feedback/suggestions you might have!

Chris

gpat fst pfst fdr population genomics • 3.2k views

ADD COMMENT • link 7.1 years ago by chrisbala10 ▴ 10

score 1 · Answer 1 · 2017-03-24

Chris,

I'm happy to hear people are using GPAT code within VCFLIB. While unpublished, there is a working manuscript in the VCF github. I've tried to describe, in better detail, what pFst and other GPAT tools do. pFst is terribly named (i don't take responsibility), because it is a likelihood ratio test that measures the difference in allele frequency, weighting allele counts with the genotype likelihood. wcFST is Weir and Cockerham's (1984) FST estimator. I've provided two different types of permutations you can do with wcFST, permuting single data points and smoothed windows. One of these workflows I describe in the working manuscript.

pFST is a statistical test and provides a proper p-value. Here are the assumption of the test if you're using it for trait mapping.

There is no genotypic stratification between your genotypic groups.
Neighboring variants are independent (almost all test out there violate this).

Happy to clarify any points or answer other questions.

score 0 · Answer 2 · 2017-03-27

Thanks Zev,

That is helpful, and we'll play around with the different workflows.

Just to clarify further/get your opinion: What would you suggest for identifying "outliers"? It seems like in your Science paper (pigeons), you estimated Fst, and just highlighted the top 1% (seems reasonable). But with pFSt, how would you suggest dealing with multiple testing? For the VAAST analysis, you used some genome-wide significance threshold. (VAAST seems interesting... might need to try that too!). Anyway, just curious if you or anyone has thoughts on best practices.

(added note: based on the VAAST docs, seems like the permutation test might help here b/c with many permutation can get the p value into the realm of Bonferroni-corrected thresholds)? (or am I missing the point here?) Chris