Question: Inverse normal transformation
0
rednalf70 wrote:

My phenotype is not normally distributed. I tried several transformations but none of them seems to improve the normality. I am now interested in an inverse normal transformation using R.

Is something like the following correct, x being my phenotype?

``````qx <- qnorm((rank(x)-0.5)/sum(x))
``````

It is based on this paper:

Yang, Jian, et al. "FTO genotype is associated with phenotypic variability of body mass index." Nature 490.7419 (2012): 267.

Histogram before transformation available here: https://imgur.com/a/BwednJB

• min: 10
• max: 750
• median: 54.75
• mean: 86.18217
• variance: 8428.881
• standard deviation: 91.80894
modified 2.4 years ago • written 2.4 years ago by rednalf70

Please show a histogram of your phenotype before any transformation. Also provide min, max, median, mean, variance, and standard deviation.

Thank you for your comment - the information has been added to the initial post.

1

I see - thanks so much. So it's currently a negative binomial or Poisson-like distribution, akin to how RNA-seq count data is measured and normalised. Have you considered a variance stabilising transformation?; or regularised log (like in DESeq2)? You could also just fit the model as a negative binomial using glm.nb

Thank you. I have considered log, square root, cube root and Johnson transformation for now. I will have a look at DESeq2. What do you think about the inverse normal transformation and the script based on Yang et al.'s paper?

From where did you find that formula? - I looked in the paper an they mentioned that they tried un-transformed, logged, and then inverse normal transformed. I could not see a formula, though.

I found it in the Supplementary Information, page 18 (https://media.nature.com/original/nature-assets/nature/journal/v490/n7419/extref/nature11401-s1.pdf)

3
Kevin Blighe65k wrote:

Okay, yes, here is the page: Note that they are not transforming the original variable. What they do is the following (for height and weight):

1. build a linear regression model `lm(height ~ age + age^2)`
2. extract residuals from model with `residuals()`
3. transform residuals by inverse norm function ```y <‐ qnorm((rank(x, na.last="keep") ‐ 0.5) / sum(!is.na(x))```

The transformed residuals (squared) are then used in your association test, as follows:

``````glm(y^2 ~ SNP)
``````

Does that help?

Kevin

Yes a lot, thank you very much for your (fast) help!