Question: Gwas - Permutation Testing?
9.5 years ago by
Ireland/ United Kingdom
I have conducted a search for statistical epistasis using a simple dosage model:

Y ~ A + B + AB

where Y is the phenotype, in this case, gene expression values and A and B are vectors of genotype information for ~500 samples. I wish to determine a signficance threshold using permutation testing in order to correct for multiple testing.

To date, I have recalculated the p-values for the interaction term (AB) for 100 permutations (I permuted the phenotype values) and am unsure how to proceed in order to derive a false discovery rate (FDR).

Any suggestions?

Thanks, D.

Couldn't you just use the fdr method of Benjamini&Hochberg, 1995 in R: p.adjust(p, method="fdr")? I think that should also be valid for permutation p-values. Concerns anyone?

@Michael: For an additive genetic model with genotypes AA AB BB, one assumes that each B or A allele has an incremental effect on the phenotype, such that AA[?]AB>BB. This is intuitively similar to treating each the A or B allele as a drug with increasing dosage. In this case the genotypes are ordinal, not categorical. For a test of epistasis, you are looking for deviation from an additive model and trying to fit an interaction term; I think it's standard to check only the interaction term.

Some things I don't quite understand: 1. how did you compute your p-values? genotypes are categorial data, how does a dosage model apply then? Why did you compute p-values only on interaction term? Why so few permutations? Given 500! possible permutations of 500 samples, I would have expected more to get a reliable estimate.

9.5 years ago by
David Quigley11k
San Francisco
A recent study in PLoS Genetics (Liu 2011) may provide some guidance, at least as a warning about how tricky this analysis really is. A large chunk of that paper describes the various nasty sources of false positives the authors discovered. They used a Bonferroni correction.

Your results are going to have complicated covariance, since INTERACT(A,B) and INTERACT(A,C) will be dependent. Have a look at a recent paper from Wing Wong's group (Ma et al. Genetic Epidemiology 2010) which addresses this issue and proposes an adaptive permutation method. I haven't tried to implement their method myself.

3.6 years ago by
hi Darren, a paper might help you:enter link description here, it explains how FDR in permutation calculate:)

3.6 years ago by
Once a p-value vector is obtained, you can compute the FDR using e.g.

Python's statsmodels,

If that helps.

