Question

Phenotype data normalization prior to GWAS

1

Entering edit mode

17 months ago

antmantras ▴ 80

Hi all.

Is it necessary, previous to conducting a GWAS (MLMM approach, Multi locus mixed model), to normalize quantitative phenotypes to make them follow a normal distribution?

After applying the Shapiro-Wilk test for each phenotype, I have observed that none of the four traits studied follow a normal distribution. See for example a histogram for a quantity of a compound "A".

Histogram of a quantitative trait "A"

I have performed GWAS with the phenotypes un-normalized and the QQ-plots obtained for each one of them are:

qqplots

Except for the compound B (if I have to choose one), the plots seem ok to me. If a previous normalization is required, which one should I use? I have read about quantile normalization or the rank-based inverse normal transformation, which seems to be more popular. Thanks in advance.

normalization gwas phenotype • 1.4k views

ADD COMMENT • link 17 months ago by antmantras ▴ 80

score 2 · Answer 1 · 2022-11-23

The core assumption of linear model statistics is normality of the standard error of the parameter estimates. This is guaranteed when the residuals are normal; but it is also guaranteed (as N -> infinity) by independence and the Central Limit Theorem. As GWAS have very large values of N, it should not generally matter if the residuals are normal or not.

For other instances of regression, a larger issue is non-linearity between response and predictors; but since GWAS only has three states (AA=0, AB=1, BB=2) it's rare to observe a deviance from linearity (not that many dominance effects).

Finally, the distribution you are showing is not merely non-normal; it appears to be zero-inflated. There is no monotonic transformation that can convert a zero-inflated distribution into a normal distribution; so the approach here would have to be to use a GLMM in place of an LMM; and explicitly model the relationship between variant dosage and (a) Probability of 0, and (b) Conditional distribution of (y|x != 0).