Question

Gwas - Permutation Testing?

4

Entering edit mode

12.9 years ago

Darren J. Fitzpatrick ★ 1.1k

Hi,

I have conducted a search for statistical epistasis using a simple dosage model:

Y ~ A + B + AB

where Y is the phenotype, in this case, gene expression values and A and B are vectors of genotype information for ~500 samples. I wish to determine a signficance threshold using permutation testing in order to correct for multiple testing.

To date, I have recalculated the p-values for the interaction term (AB) for 100 permutations (I permuted the phenotype values) and am unsure how to proceed in order to derive a false discovery rate (FDR).

Any suggestions?

Thanks, D.

gwas snp multiple • 8.7k views

ADD COMMENT • link updated 7.0 years ago by Bioaln ▴ 360 • written 12.9 years ago by Darren J. Fitzpatrick ★ 1.1k

1

Entering edit mode

Couldn't you just use the fdr method of Benjamini&Hochberg, 1995 in R: p.adjust(p, method="fdr")? I think that should also be valid for permutation p-values. Concerns anyone?

ADD REPLY • link 12.9 years ago by Michael 54k

1

Entering edit mode

@Michael: For an additive genetic model with genotypes AA AB BB, one assumes that each B or A allele has an incremental effect on the phenotype, such that AA[?]AB>BB. This is intuitively similar to treating each the A or B allele as a drug with increasing dosage. In this case the genotypes are ordinal, not categorical. For a test of epistasis, you are looking for deviation from an additive model and trying to fit an interaction term; I think it's standard to check only the interaction term.

ADD REPLY • link 12.9 years ago by David Quigley 11k

0

Entering edit mode

Some things I don't quite understand: 1. how did you compute your p-values? genotypes are categorial data, how does a dosage model apply then? Why did you compute p-values only on interaction term? Why so few permutations? Given 500! possible permutations of 500 samples, I would have expected more to get a reliable estimate.

ADD REPLY • link 12.9 years ago by Michael 54k

score 1 · Answer 1 · 2011-05-22

A recent study in PLoS Genetics (Liu 2011) may provide some guidance, at least as a warning about how tricky this analysis really is. A large chunk of that paper describes the various nasty sources of false positives the authors discovered. They used a Bonferroni correction.

Your results are going to have complicated covariance, since INTERACT(A,B) and INTERACT(A,C) will be dependent. Have a look at a recent paper from Wing Wong's group (Ma et al. Genetic Epidemiology 2010) which addresses this issue and proposes an adaptive permutation method. I haven't tried to implement their method myself.

score 0 · Answer 2 · 2017-04-20

0

Entering edit mode

7.0 years ago

hellocita ▴ 40

hi Darren, a paper might help you:enter link description here, it explains how FDR in permutation calculate:)

ADD COMMENT • link 7.0 years ago by hellocita ▴ 40

score 0 · Answer 3 · 2017-04-27

0

Entering edit mode

7.0 years ago

Bioaln ▴ 360

Once a p-value vector is obtained, you can compute the FDR using e.g.

Python's statsmodels,

http://www.statsmodels.org/devel/stats.html#multiple-tests-and-multiple-comparison-procedures

If that helps.

ADD COMMENT • link 7.0 years ago by Bioaln ▴ 360