Question

What should I do when error residuals are not normally distributed in a linear mixed-effect model?

1

Entering edit mode

6.1 years ago

ybbarnatan ▴ 10

Hello all!!! I'm trying to analyze some experimental data about animal behaviour and would need some help or advice regarding which non-parametric test should I use.

The variables I have are: - Response variable: "Vueltasmin", a continuous one (both positive and negative values) - Explicatory variable: "Condicion", a factor with 6 levels - Random effect variable: "Bicho", as the same animal performing some behavioural task was measured more than once.

As I have a random effect variable, I chose a mixed model. Then, when checking the normality and homoscedasticity assumptions, Shapiro-Wilks test showed there was no normality and QQplots revealed there weren´t patterns nor outliers in my data. So the question would be: which non-parametric test would be optimal in this case, knowing that I would like to perform certain a posteriori comparisons (and not all-against-all comparisons): red vs grey; red vs black; red vs light blue; black vs grey.

My database has lots of zeros responses in some conditions, I´ve read that for t-students tests lacking of normality due to lots of zeros it´s OK to turn a blind eye on lack of normality (Srivastava, 1958; Sullivan & D'agostino, 1992) ... is there something similar with mixed models?

DATA PLOT

Here is some information that might be useful. I´d like to thank everyone in advance!

DATABASE: is composed of 174 observations (29 individuals that were tested in 6 different situations or tasks, represented by one colour in the bar graph and hence the random effect variable); "Bicho" stands for the individual; "Condicion" states the explicatory variable and "Vueltasmin" is the response variable. "Datos" is the name of my database.

CODE

Condicion<-as.factor(Condicion)
Vueltasmin<-as.numeric(Vueltasmin)

## My model should be: Vueltasmin = Condicion + 1|Bicho
m1 <- lmer(Vueltasmin ~ Condicion + (1 | Bicho), Datos)

#Checking assumptions BEFORE looking at the stats:
e1<-resid(m1) # Pearson residues
pre1<-predict(m1) #predicted

windows()
par(mfrow = c(1, 2))
plot(pre1, e1, xlab="Predichos", ylab="Residuos de pearson",main="Gráfico de     
dispersión de RE vs PRED",cex.main=.8 )

Pearson's residue

abline(0,0)
qqnorm(e1, cex.main=.9)   #QQ plot
qqline(e1)
par(mfrow = c(1, 1))
shapiro.test(e1)      
#SHAPIRO WILKS: NO NORMALITY!!!

No normality shapiro and histogram

R residual error mixed model • 26k views

ADD COMMENT • link 6.1 years ago by ybbarnatan ▴ 10

3

Entering edit mode

You'll likely want to post this on cross-validated instead of here. While many of us use mixed-effect models on occasion, I don't know that there are many people here comfortable giving advice on this particular issue.

ADD REPLY • link 6.1 years ago by Devon Ryan 104k

0

Entering edit mode

Hi @Devon Ryan, thanks! I´ve already done that and I had no luck, nobody answered my question so I´ve been looking for other statistics forums.

ADD REPLY • link 6.1 years ago by ybbarnatan ▴ 10

1

Entering edit mode

N.B., I've changed your mentions of GLM to mixed model or mixed-effect model. You're not using a GLM.

ADD REPLY • link 6.1 years ago by Devon Ryan 104k

0

Entering edit mode

I thought they meant the same, so thanks for the correction!

ADD REPLY • link 6.1 years ago by ybbarnatan ▴ 10

score 1 · Answer 1 · 2018-03-29

Strictly speaking, non-normality of the residuals is an indication of an inadequate model. It means that the errors the model makes are not consistent across variables and observations (i.e. the errors are not random).
The first step should be to look at your data. What kind of distribution would fit your data ? Are there outliers ? If you have lots of 0 this is probably why your data is not normally distributed. A usual remedy is to use a transformation of the variables to make them closer to normally distributed but some people argue against this and to use a more appropriate method instead (i.e. generalized linear mixed model).
Some links that can help:
A practical guide to mixed models in R
Checking assumptions in mixed models.
Robustness of linear mixed models.

score 0 · Answer 2 · 2018-03-29

Hi Jean-Karim Heriche, thanks for the reply. The first thing I do before looking at P-values after running a model, is checking the assumptions and looking for outliers, both in X and Y with boxplots and Cook's distance. Regarding outliers, there are none; and regarding the assumptions, I meet homoscedasticity but lack of normality.

As Devon Ryan stated before, I may have stated I used GLM but in fact, I used mixed effect models (I thought they were the same thing). Hence my question: what should I do when I´m not using linear regression or ANOVA but mixed effect models, and still get lack of normality. I believe the lack of normality comes only because of the large number of zero responses, and I want to know if there is some test I can do when not having normalilty (like I would do a Kurskall-Wallis for a non parametric ANOVA); or if there is some publication or paper that supports going on with the current analysis, as I cited there are cases that it´s OK to turn a blind eye on lack of normality due to a floor effect when performing t-tests.

I didn't try transforming the dependent variable yet, as it´s the last thing I want to do. I would have to interpret, for example, log(Y) instead of Y, and that would lack of biological meaning to me...so I´m trying to explore if there is something else I can do before transforming it. Any thoughts?? Thanks!!!

P.s: thanks for the links, will look at them thoroughly.