Question: Normality Assumption for Linear Regression
gravatar for nbotha1994
7 months ago by
nbotha19940 wrote:

How strict is linear regression on the assumption of normality?

Even after transforming my data with the optimum method for normalization (identified by using bestNormalize package in R), my data is still not normal or near normal. Can I still do a linear regression on this not normally distributed data?


ADD COMMENTlink written 7 months ago by nbotha19940

Lots of data will never have normal residules because it is inherently not normally distributed data. You should not model this data with a linear regression, but rather use a generalised linear model from a family appropriate for the data to model it. To help you choose what might be an appropriate model, we would need to know what the data is.

ADD REPLYlink written 7 months ago by i.sudbery5.9k
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe50k
Kevin Blighe50k wrote:

The assumption of normality relates to the residuals and not the data itself. I, personally, would not use packages like bestNormalize to tell me how the data looks. Also remember that only fabricated data will have a perfect normal. You could start by plotting a histogram of your data and providing metrics like min, max, covariance, sdev, mean, median, interquartile range, etc.

ADD COMMENTlink written 7 months ago by Kevin Blighe50k

Hi, thank you for your response.

I don't think I understand what residuals mean? Is it your data after transformation or the data after linear regression has been performed?

ADD REPLYlink written 6 months ago by nbotha19940

The residual at a given point is the difference between the value estimated by the model and the actual value.

ADD REPLYlink written 6 months ago by Jean-Karim Heriche21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1868 users visited in the last hour