Question

Normality Assumption for Linear Regression

0

Entering edit mode

5.1 years ago

nbotha1994 • 0

How strict is linear regression on the assumption of normality?

Even after transforming my data with the optimum method for normalization (identified by using bestNormalize package in R), my data is still not normal or near normal. Can I still do a linear regression on this not normally distributed data?

Thanks

data transformation quantitative trait association • 1.6k views

ADD COMMENT • link 5.1 years ago by nbotha1994 • 0

1

Entering edit mode

Lots of data will never have normal residules because it is inherently not normally distributed data. You should not model this data with a linear regression, but rather use a generalised linear model from a family appropriate for the data to model it. To help you choose what might be an appropriate model, we would need to know what the data is.

ADD REPLY • link 5.1 years ago by i.sudbery 19k

score 5 · Accepted Answer · 2019-03-22

5

Entering edit mode

5.1 years ago

Kevin Blighe 87k

The assumption of normality relates to the residuals and not the data itself. I, personally, would not use packages like bestNormalize to tell me how the data looks. Also remember that only fabricated data will have a perfect normal. You could start by plotting a histogram of your data and providing metrics like min, max, covariance, sdev, mean, median, interquartile range, etc.

ADD COMMENT • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi, thank you for your response.

I don't think I understand what residuals mean? Is it your data after transformation or the data after linear regression has been performed?

ADD REPLY • link 5.1 years ago by nbotha1994 • 0

3

Entering edit mode

The residual at a given point is the difference between the value estimated by the model and the actual value.

ADD REPLY • link 5.1 years ago by Jean-Karim Heriche 27k