Question: Inverse normal transformation
0
gravatar for rednalf
19 months ago by
rednalf60
rednalf60 wrote:

My phenotype is not normally distributed. I tried several transformations but none of them seems to improve the normality. I am now interested in an inverse normal transformation using R.

Is something like the following correct, x being my phenotype?

qx <- qnorm((rank(x)-0.5)/sum(x))

It is based on this paper:

Yang, Jian, et al. "FTO genotype is associated with phenotypic variability of body mass index." Nature 490.7419 (2012): 267.

Histogram before transformation available here: https://imgur.com/a/BwednJB

More information about the raw data:

  • min: 10
  • max: 750
  • median: 54.75
  • mean: 86.18217
  • variance: 8428.881
  • standard deviation: 91.80894
ADD COMMENTlink modified 19 months ago • written 19 months ago by rednalf60

Please show a histogram of your phenotype before any transformation. Also provide min, max, median, mean, variance, and standard deviation.

ADD REPLYlink written 19 months ago by Kevin Blighe52k

Thank you for your comment - the information has been added to the initial post.

ADD REPLYlink written 19 months ago by rednalf60

I see - thanks so much. So it's currently a negative binomial or Poisson-like distribution, akin to how RNA-seq count data is measured and normalised. Have you considered a variance stabilising transformation?; or regularised log (like in DESeq2)? You could also just fit the model as a negative binomial using glm.nb

ADD REPLYlink written 19 months ago by Kevin Blighe52k

Thank you. I have considered log, square root, cube root and Johnson transformation for now. I will have a look at DESeq2. What do you think about the inverse normal transformation and the script based on Yang et al.'s paper?

ADD REPLYlink written 19 months ago by rednalf60

From where did you find that formula? - I looked in the paper an they mentioned that they tried un-transformed, logged, and then inverse normal transformed. I could not see a formula, though.

ADD REPLYlink written 19 months ago by Kevin Blighe52k

I found it in the Supplementary Information, page 18 (https://media.nature.com/original/nature-assets/nature/journal/v490/n7419/extref/nature11401-s1.pdf)

ADD REPLYlink written 19 months ago by rednalf60
3
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe52k
Kevin Blighe52k wrote:

Okay, yes, here is the page:

Screen_Shot_2018_05_03_at_16_52_01

Note that they are not transforming the original variable. What they do is the following (for height and weight):

  1. build a linear regression model lm(height ~ age + age^2)
  2. extract residuals from model with residuals()
  3. transform residuals by inverse norm function y <‐ qnorm((rank(x, na.last="keep") ‐ 0.5) / sum(!is.na(x))

The transformed residuals (squared) are then used in your association test, as follows:

glm(y^2 ~ SNP)

Does that help?

Kevin

ADD COMMENTlink modified 19 months ago • written 19 months ago by Kevin Blighe52k

Yes a lot, thank you very much for your (fast) help!

ADD REPLYlink written 19 months ago by rednalf60

No problem. Note that they also segregate the analysis into two based on gender (and only those >18 years of age)

ADD REPLYlink written 19 months ago by Kevin Blighe52k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2376 users visited in the last hour