Question

Deflated P-values (GWAS)

1

Entering edit mode

4.4 years ago

solanum ▴ 10

Hi

I performed genome wide assoctiations (linear mixed model) of 640'000 SNPs and transformed phenotypic traits of 160 individuals. After correcting for population structure and kinship, all resulting Q-Q plots looked more or less like this:

enter image description here

What could possibly cause deflated p-value distributions in GWAS? Changing the number of PCs and adding different random effects did not substantially improve the model.

Many thanks in advance

SNP gwas p-value • 5.3k views

ADD COMMENT • link 4.4 years ago by solanum ▴ 10

1

Entering edit mode

just a comment: what happens if you apply FDR correction for your p-values? Can you make a plot of your p-values but not as qq but as histogram from 0 to 1 (with e.g. 100 of breaks)?

ADD REPLY • link 4.4 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

I updated the figure. FDR-corrected p-values are highly insignificant, as there is no excess of p-values<0.05.

ADD REPLY • link 4.4 years ago by solanum ▴ 10

1

Entering edit mode

Well, at first - your histogram looks quite uniform there, so I'd wont expect any significant number of "true positives" - otherwise you'd see a small increase of the first left column. I do not like the "bump" in the 2nd column from the left. It looks like all the p-values from the left part which could be significant "migrated" there due to some regularization. Another thing - if you have 1-2 samples as "outliers" (large variability) it will cause over-estimation of residuals variance in your linear model and then all the effects will be "hidden"

I am not a specialist in SNP-array analysis, these are just general considerations which may be totally useless.

ADD REPLY • link 4.4 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Thank you for sharing your thoughts. In this example, I'd not expect any significant SNPs either but I'd expect the p-values to follow the expected p-value distribution better. The "bump" you mentioned is a pattern that occurs also in associations with other traits, what do you mean with regularization?

ADD REPLY • link 4.4 years ago by solanum ▴ 10

1

Entering edit mode

The term I've used is not actually correct - regularization means another thing, but does not matter. The thing is - you operate with the data which is discrete (presence or absence of a SNP) and you apply (I guess) some sort of linear model there. All these models are normally obtained via continuous approximations to the discrete data (e.g., proportion is modelled with the normal distribution - and it is fine when it is a proportion of 0.5 amongst 10.000 samples, but when it is a proportion of 0.01 or the sample size is 160 - you may expect troubles). So this bump may occur as a consequence of this. Or if you have outliers, or if you use a regularized model (this process I can not describe in a comment). Hidden dependencies may also affect your histogram shape (and there are dependencies).

ADD REPLY • link 4.4 years ago by German.M.Demidov ★ 2.9k

score 2 · Answer 1 · 2019-12-04

2

Entering edit mode

4.4 years ago

chrchang523 10k

If you computed PCs off the same set of variants that you included in the main GWAS, that results in p-value deflation. It’s better to compute PCs off a LD-pruned subset of them.

Also, 160 samples is small enough that an asymptotically valid p-value formula used by the software package may be noticeably off.

ADD COMMENT • link 4.4 years ago by chrchang523 10k

0

Entering edit mode

Thank you very much for your answer. I used an LD-pruned SNP-set for the computation of the PCs and not the whole set of SNPs so this should not be an issue.

I use the Score test for the calculation of the p-values, which has an asymptotic valid p-value formula (as far as I know). So this could indeed pose a problem.

ADD REPLY • link 4.4 years ago by solanum ▴ 10