What should I do when error residuals are not normally distributed in a linear mixed-effect model?
2
1
Entering edit mode
6.1 years ago
ybbarnatan ▴ 10

Hello all!!! I'm trying to analyze some experimental data about animal behaviour and would need some help or advice regarding which non-parametric test should I use.

The variables I have are: - Response variable: "Vueltasmin", a continuous one (both positive and negative values) - Explicatory variable: "Condicion", a factor with 6 levels - Random effect variable: "Bicho", as the same animal performing some behavioural task was measured more than once.

As I have a random effect variable, I chose a mixed model. Then, when checking the normality and homoscedasticity assumptions, Shapiro-Wilks test showed there was no normality and QQplots revealed there weren´t patterns nor outliers in my data. So the question would be: which non-parametric test would be optimal in this case, knowing that I would like to perform certain a posteriori comparisons (and not all-against-all comparisons): red vs grey; red vs black; red vs light blue; black vs grey.

My database has lots of zeros responses in some conditions, I´ve read that for t-students tests lacking of normality due to lots of zeros it´s OK to turn a blind eye on lack of normality (Srivastava, 1958; Sullivan & D'agostino, 1992) ... is there something similar with mixed models?

DATA PLOT

Here is some information that might be useful. I´d like to thank everyone in advance!

DATABASE: is composed of 174 observations (29 individuals that were tested in 6 different situations or tasks, represented by one colour in the bar graph and hence the random effect variable); "Bicho" stands for the individual; "Condicion" states the explicatory variable and "Vueltasmin" is the response variable. "Datos" is the name of my database.

CODE

Condicion<-as.factor(Condicion)
Vueltasmin<-as.numeric(Vueltasmin)

## My model should be: Vueltasmin = Condicion + 1|Bicho
m1 <- lmer(Vueltasmin ~ Condicion + (1 | Bicho), Datos)

#Checking assumptions BEFORE looking at the stats:
e1<-resid(m1) # Pearson residues
pre1<-predict(m1) #predicted

windows()
par(mfrow = c(1, 2))
plot(pre1, e1, xlab="Predichos", ylab="Residuos de pearson",main="Gráfico de     
dispersión de RE vs PRED",cex.main=.8 )

Pearson's residue

abline(0,0)
qqnorm(e1, cex.main=.9)   #QQ plot
qqline(e1)
par(mfrow = c(1, 1))
shapiro.test(e1)      
#SHAPIRO WILKS: NO NORMALITY!!!

No normality shapiro and histogram

R residual error mixed model • 26k views
ADD COMMENT
3
Entering edit mode

You'll likely want to post this on cross-validated instead of here. While many of us use mixed-effect models on occasion, I don't know that there are many people here comfortable giving advice on this particular issue.

ADD REPLY
0
Entering edit mode

Hi @Devon Ryan, thanks! I´ve already done that and I had no luck, nobody answered my question so I´ve been looking for other statistics forums.

ADD REPLY
1
Entering edit mode

N.B., I've changed your mentions of GLM to mixed model or mixed-effect model. You're not using a GLM.

ADD REPLY
0
Entering edit mode

I thought they meant the same, so thanks for the correction!

ADD REPLY
1
Entering edit mode
6.1 years ago

Strictly speaking, non-normality of the residuals is an indication of an inadequate model. It means that the errors the model makes are not consistent across variables and observations (i.e. the errors are not random).
The first step should be to look at your data. What kind of distribution would fit your data ? Are there outliers ? If you have lots of 0 this is probably why your data is not normally distributed. A usual remedy is to use a transformation of the variables to make them closer to normally distributed but some people argue against this and to use a more appropriate method instead (i.e. generalized linear mixed model).
Some links that can help:
A practical guide to mixed models in R
Checking assumptions in mixed models.
Robustness of linear mixed models.

ADD COMMENT
0
Entering edit mode
6.1 years ago
ybbarnatan ▴ 10

Hi Jean-Karim Heriche, thanks for the reply. The first thing I do before looking at P-values after running a model, is checking the assumptions and looking for outliers, both in X and Y with boxplots and Cook's distance. Regarding outliers, there are none; and regarding the assumptions, I meet homoscedasticity but lack of normality.

As Devon Ryan stated before, I may have stated I used GLM but in fact, I used mixed effect models (I thought they were the same thing). Hence my question: what should I do when I´m not using linear regression or ANOVA but mixed effect models, and still get lack of normality. I believe the lack of normality comes only because of the large number of zero responses, and I want to know if there is some test I can do when not having normalilty (like I would do a Kurskall-Wallis for a non parametric ANOVA); or if there is some publication or paper that supports going on with the current analysis, as I cited there are cases that it´s OK to turn a blind eye on lack of normality due to a floor effect when performing t-tests.

I didn't try transforming the dependent variable yet, as it´s the last thing I want to do. I would have to interpret, for example, log(Y) instead of Y, and that would lack of biological meaning to me...so I´m trying to explore if there is something else I can do before transforming it. Any thoughts?? Thanks!!!

P.s: thanks for the links, will look at them thoroughly.

ADD COMMENT
0
Entering edit mode

Please do not create an answer when replying to an answer or a comment. Use the appropriate button ('Add comment' or 'Add reply'. This keeps threads organized.
I did get you were using a linear mixed model and my answer applies in this case.
The difficulty in interpretability is why some are against using variable transformation in linear models (i.e. variance(log(Y) != log(variance(Y) and a coefficient alpha being significant when fitting log(Y) doesn't imply that exp(alpha) is significant). The alternative, as I wrote, is to use a generalized linear mixed model, i.e. a model that allows errors to follow distributions other than normal. The point is that you're violating the assumption of normality of the residuals so the model is inadequate to explain your data. Depending on what your purpose is, you could decide the model is good enough. Under some circumstances, for large samples, you can probably get away with departure from normality but small sample sizes require meeting the distribution assumption. I suggest you take this up with a statistician near you so that you can discuss in details what your goals are and the specifics of your data.

ADD REPLY

Login before adding your answer.

Traffic: 2817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6