Question: What should I do when error residuals are not normally distributed in a linear mixed-effect model?
0
ybbarnatan0 wrote:

Hello all!!! I'm trying to analyze some experimental data about animal behaviour and would need some help or advice regarding which non-parametric test should I use.

The variables I have are: - Response variable: "Vueltasmin", a continuous one (both positive and negative values) - Explicatory variable: "Condicion", a factor with 6 levels - Random effect variable: "Bicho", as the same animal performing some behavioural task was measured more than once.

As I have a random effect variable, I chose a mixed model. Then, when checking the normality and homoscedasticity assumptions, Shapiro-Wilks test showed there was no normality and QQplots revealed there weren´t patterns nor outliers in my data. So the question would be: which non-parametric test would be optimal in this case, knowing that I would like to perform certain a posteriori comparisons (and not all-against-all comparisons): red vs grey; red vs black; red vs light blue; black vs grey.

My database has lots of zeros responses in some conditions, I´ve read that for t-students tests lacking of normality due to lots of zeros it´s OK to turn a blind eye on lack of normality (Srivastava, 1958; Sullivan & D'agostino, 1992) ... is there something similar with mixed models? Here is some information that might be useful. I´d like to thank everyone in advance!

DATABASE: is composed of 174 observations (29 individuals that were tested in 6 different situations or tasks, represented by one colour in the bar graph and hence the random effect variable); "Bicho" stands for the individual; "Condicion" states the explicatory variable and "Vueltasmin" is the response variable. "Datos" is the name of my database.

CODE

``````Condicion<-as.factor(Condicion)
Vueltasmin<-as.numeric(Vueltasmin)

## My model should be: Vueltasmin = Condicion + 1|Bicho
m1 <- lmer(Vueltasmin ~ Condicion + (1 | Bicho), Datos)

#Checking assumptions BEFORE looking at the stats:
e1<-resid(m1) # Pearson residues
pre1<-predict(m1) #predicted

windows()
par(mfrow = c(1, 2))
plot(pre1, e1, xlab="Predichos", ylab="Residuos de pearson",main="Gráfico de
dispersión de RE vs PRED",cex.main=.8 )
`````` ``````abline(0,0)
qqnorm(e1, cex.main=.9)   #QQ plot
qqline(e1)
par(mfrow = c(1, 1))
shapiro.test(e1)
#SHAPIRO WILKS: NO NORMALITY!!!
`````` mixed model residual error R • 4.1k views
modified 15 months ago • written 15 months ago by ybbarnatan0
3

You'll likely want to post this on cross-validated instead of here. While many of us use mixed-effect models on occasion, I don't know that there are many people here comfortable giving advice on this particular issue.

Hi @Devon Ryan, thanks! I´ve already done that and I had no luck, nobody answered my question so I´ve been looking for other statistics forums.

1

N.B., I've changed your mentions of GLM to mixed model or mixed-effect model. You're not using a GLM.

I thought they meant the same, so thanks for the correction!

0
Jean-Karim Heriche19k wrote:

Strictly speaking, non-normality of the residuals is an indication of an inadequate model. It means that the errors the model makes are not consistent across variables and observations (i.e. the errors are not random).
The first step should be to look at your data. What kind of distribution would fit your data ? Are there outliers ? If you have lots of 0 this is probably why your data is not normally distributed. A usual remedy is to use a transformation of the variables to make them closer to normally distributed but some people argue against this and to use a more appropriate method instead (i.e. generalized linear mixed model).
A practical guide to mixed models in R
Checking assumptions in mixed models.
Robustness of linear mixed models.

0
ybbarnatan0 wrote:

Hi Jean-Karim Heriche, thanks for the reply. The first thing I do before looking at P-values after running a model, is checking the assumptions and looking for outliers, both in X and Y with boxplots and Cook's distance. Regarding outliers, there are none; and regarding the assumptions, I meet homoscedasticity but lack of normality.

As Devon Ryan stated before, I may have stated I used GLM but in fact, I used mixed effect models (I thought they were the same thing). Hence my question: what should I do when I´m not using linear regression or ANOVA but mixed effect models, and still get lack of normality. I believe the lack of normality comes only because of the large number of zero responses, and I want to know if there is some test I can do when not having normalilty (like I would do a Kurskall-Wallis for a non parametric ANOVA); or if there is some publication or paper that supports going on with the current analysis, as I cited there are cases that it´s OK to turn a blind eye on lack of normality due to a floor effect when performing t-tests.

I didn't try transforming the dependent variable yet, as it´s the last thing I want to do. I would have to interpret, for example, log(Y) instead of Y, and that would lack of biological meaning to me...so I´m trying to explore if there is something else I can do before transforming it. Any thoughts?? Thanks!!!

P.s: thanks for the links, will look at them thoroughly.