Multiple hypothesis testing, Fst outliers
2
0
Entering edit mode
4.7 years ago
chrisbala10 ▴ 10

Hi All,

We are trying to identify fst outliers from a comparison of two populations. We've settled on using pFst from the GPAT++ package.

To my reading, in such studies folks sometimes use bonferonni or benjami-hochberg-corrected p values. In other cases, I've seen arbitrary cutoffs like 10-8. I'm wondering if anyone has suggestions about best practices?

I also notice that GPAT has a permutation test to derive empirical p values, and that seems desirable, but it is not totally clear from the documentation what this is doing. It would seem like in pFst itself, we could shuffle population assignment to get a null distribution of pFst. but I wonder if this is redundant with what pFst is already doing...

Thanks for any feedback/suggestions you might have!

Chris

gpat fst pfst fdr population genomics • 2.5k views
ADD COMMENT
1
Entering edit mode
4.7 years ago

Chris,

I'm happy to hear people are using GPAT code within VCFLIB. While unpublished, there is a working manuscript in the VCF github. I've tried to describe, in better detail, what pFst and other GPAT tools do. pFst is terribly named (i don't take responsibility), because it is a likelihood ratio test that measures the difference in allele frequency, weighting allele counts with the genotype likelihood. wcFST is Weir and Cockerham's (1984) FST estimator. I've provided two different types of permutations you can do with wcFST, permuting single data points and smoothed windows. One of these workflows I describe in the working manuscript.

pFST is a statistical test and provides a proper p-value. Here are the assumption of the test if you're using it for trait mapping.

  1. There is no genotypic stratification between your genotypic groups.
  2. Neighboring variants are independent (almost all test out there violate this).

Happy to clarify any points or answer other questions.

ADD COMMENT
0
Entering edit mode
4.7 years ago
chrisbala10 ▴ 10

Thanks Zev,

That is helpful, and we'll play around with the different workflows.

Just to clarify further/get your opinion: What would you suggest for identifying "outliers"? It seems like in your Science paper (pigeons), you estimated Fst, and just highlighted the top 1% (seems reasonable). But with pFSt, how would you suggest dealing with multiple testing? For the VAAST analysis, you used some genome-wide significance threshold. (VAAST seems interesting... might need to try that too!). Anyway, just curious if you or anyone has thoughts on best practices.

(added note: based on the VAAST docs, seems like the permutation test might help here b/c with many permutation can get the p value into the realm of Bonferroni-corrected thresholds)? (or am I missing the point here?) Chris

ADD COMMENT
0
Entering edit mode

It really depends on the study. In the case of the Ephb2 (head crest), the allele was recessive so we were able to reach significance. If i remember correctly, pFst withstood a multiple test correction.

ADD REPLY

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6