Question

Inverse normal transformation

1

Entering edit mode

6.0 years ago

rednalf ▴ 90

My phenotype is not normally distributed. I tried several transformations but none of them seems to improve the normality. I am now interested in an inverse normal transformation using R.

Is something like the following correct, x being my phenotype?

qx <- qnorm((rank(x)-0.5)/sum(x))

It is based on this paper:

Yang, Jian, et al. "FTO genotype is associated with phenotypic variability of body mass index." Nature 490.7419 (2012): 267.

Histogram before transformation available here: https://imgur.com/a/BwednJB

More information about the raw data:

min: 10
max: 750
median: 54.75
mean: 86.18217
variance: 8428.881
standard deviation: 91.80894

R transformation inverse statistics • 8.1k views

ADD COMMENT • link 6.0 years ago by rednalf ▴ 90

0

Entering edit mode

Please show a histogram of your phenotype before any transformation. Also provide min, max, median, mean, variance, and standard deviation.

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you for your comment - the information has been added to the initial post.

ADD REPLY • link 6.0 years ago by rednalf ▴ 90

1

Entering edit mode

I see - thanks so much. So it's currently a negative binomial or Poisson-like distribution, akin to how RNA-seq count data is measured and normalised. Have you considered a variance stabilising transformation?; or regularised log (like in DESeq2)? You could also just fit the model as a negative binomial using glm.nb

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you. I have considered log, square root, cube root and Johnson transformation for now. I will have a look at DESeq2. What do you think about the inverse normal transformation and the script based on Yang et al.'s paper?

ADD REPLY • link 6.0 years ago by rednalf ▴ 90

0

Entering edit mode

From where did you find that formula? - I looked in the paper an they mentioned that they tried un-transformed, logged, and then inverse normal transformed. I could not see a formula, though.

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k

0

Entering edit mode

I found it in the Supplementary Information, page 18 (https://media.nature.com/original/nature-assets/nature/journal/v490/n7419/extref/nature11401-s1.pdf)

ADD REPLY • link 6.0 years ago by rednalf ▴ 90

score 5 · Accepted Answer · 2018-05-03

5

Entering edit mode

6.0 years ago

Kevin Blighe 87k

Okay, yes, here is the page:

Note that they are not transforming the original variable. What they do is the following (for height and weight):

build a linear regression model lm(height ~ age + age^2)
extract residuals from model with residuals()
transform residuals by inverse norm function y <‐ qnorm((rank(x, na.last="keep") ‐ 0.5) / sum(!is.na(x))

The transformed residuals (squared) are then used in your association test, as follows:

glm(y^2 ~ SNP)

Does that help?

Kevin

ADD COMMENT • link 6.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes a lot, thank you very much for your (fast) help!

ADD REPLY • link 6.0 years ago by rednalf ▴ 90

0

Entering edit mode

No problem. Note that they also segregate the analysis into two based on gender (and only those >18 years of age)

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k