Question: Normality Assumption for Linear Regression
gravatar for nbotha1994
4 weeks ago by
nbotha19940 wrote:

How strict is linear regression on the assumption of normality?

Even after transforming my data with the optimum method for normalization (identified by using bestNormalize package in R), my data is still not normal or near normal. Can I still do a linear regression on this not normally distributed data?


ADD COMMENTlink written 4 weeks ago by nbotha19940

Lots of data will never have normal residules because it is inherently not normally distributed data. You should not model this data with a linear regression, but rather use a generalised linear model from a family appropriate for the data to model it. To help you choose what might be an appropriate model, we would need to know what the data is.

ADD REPLYlink written 4 weeks ago by i.sudbery4.3k
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

The assumption of normality relates to the residuals and not the data itself. I, personally, would not use packages like bestNormalize to tell me how the data looks. Also remember that only fabricated data will have a perfect normal. You could start by plotting a histogram of your data and providing metrics like min, max, covariance, sdev, mean, median, interquartile range, etc.

ADD COMMENTlink written 4 weeks ago by Kevin Blighe41k

Hi, thank you for your response.

I don't think I understand what residuals mean? Is it your data after transformation or the data after linear regression has been performed?

ADD REPLYlink written 24 days ago by nbotha19940

The residual at a given point is the difference between the value estimated by the model and the actual value.

ADD REPLYlink written 24 days ago by Jean-Karim Heriche18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1161 users visited in the last hour