Phenotype data normalization prior to GWAS
1
1
Entering edit mode
17 months ago
antmantras ▴ 80

Hi all.

Is it necessary, previous to conducting a GWAS (MLMM approach, Multi locus mixed model), to normalize quantitative phenotypes to make them follow a normal distribution?

After applying the Shapiro-Wilk test for each phenotype, I have observed that none of the four traits studied follow a normal distribution. See for example a histogram for a quantity of a compound "A".

Histogram of a quantitative trait "A"

I have performed GWAS with the phenotypes un-normalized and the QQ-plots obtained for each one of them are:

qqplots

Except for the compound B (if I have to choose one), the plots seem ok to me. If a previous normalization is required, which one should I use? I have read about quantile normalization or the rank-based inverse normal transformation, which seems to be more popular. Thanks in advance.

normalization gwas phenotype • 1.4k views
ADD COMMENT
2
Entering edit mode
17 months ago
LChart 3.9k

The core assumption of linear model statistics is normality of the standard error of the parameter estimates. This is guaranteed when the residuals are normal; but it is also guaranteed (as N -> infinity) by independence and the Central Limit Theorem. As GWAS have very large values of N, it should not generally matter if the residuals are normal or not.

For other instances of regression, a larger issue is non-linearity between response and predictors; but since GWAS only has three states (AA=0, AB=1, BB=2) it's rare to observe a deviance from linearity (not that many dominance effects).

Finally, the distribution you are showing is not merely non-normal; it appears to be zero-inflated. There is no monotonic transformation that can convert a zero-inflated distribution into a normal distribution; so the approach here would have to be to use a GLMM in place of an LMM; and explicitly model the relationship between variant dosage and (a) Probability of 0, and (b) Conditional distribution of (y|x != 0).

ADD COMMENT
0
Entering edit mode

Thanks for your help!

Although it seems that there are many zero values, actually those are zero-points (0.2, 0.5, 0.3, etc). There are only 3 zero values in the all set of phenotypes. Would then your approach be necessary? If yes, could you recommend me any tool to perform this analysis?

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6