Entering edit mode
9 weeks ago
curious ▴ 720
I understand that rank-based inverse normal transformation of a non-normal quantitative phenotype helps make the trait more normal for linear regression and that this is common to do
But sometimes I read about folks taking first regressing the quantitative phenotype on the covariates, taking residuals, rank-based inverse normal transformation of the residuals, then run the GWAS on that.
Why is this done?
IMO this is poor practice. The one case where it is plausibly justifiable is if the covariates impact the phenotype on the observed scale (i.e., non-normal) but not so much on the transformed scale (i.e., normal); so the transformation has to occur after the regression. However typically the r^2 of covariates is fairly low in the first place, so it's really hard to justify pre-transformation covariate regression.
At the same time, this approach is conservative in the sense that, if there are correlations between covariates and variants, the maximum proportion of the variance will be apportioned to the covariate as opposed to the variant. Typically degrees of freedom are enormous, so the fact that 15 or 25 or 50 degrees of freedom have been used (out of 750,000) won't matter much for p-values in GWAS (but definitely will in differential expression where DoF are much lower).