3
$\begingroup$

I'm dealing with longitudinal data, and to take into account the dependence of observations within the cluster, I shall rely on a generalized linear mixed model. I have a continuous response variable, and I'd like to fit a Gaussian mixed model. However, plotting the density of the response (even after a log transformation) does not seem to be normal. It has two local maxima (where the second local is also a global maximum).

Is it appropriate to work with a Gaussian model?

$\endgroup$
2
  • $\begingroup$ it's not the response that has to be normal, but the response conditionally on the regressors, i.e. the residuals should be normal. $\endgroup$
    – utobi
    Commented Jan 4, 2023 at 11:22
  • 1
    $\begingroup$ You are right, stupid point. Indeed what I should care about is the conditional distribution, and not the marginal.Basically, I should fit the model and then test for the normality of the residuals, right? $\endgroup$
    – Maximilian
    Commented Jan 4, 2023 at 11:27

3 Answers 3

5
$\begingroup$

It is unfortunately a common misunderstanding that Linear Mixed-Effects (LME) models, like any classical Linear Model (LM), assume that the response is normally distributed with suitable parameters. The truth is that LM(E) assume that the response is normal with suitable parameters conditionally on the covariates.

Reading David's answer made me recall that there is a subtle but important difference between the residuals of an LM and that of an LME. This difference is due to the presence of random effects. To check the residuals of an LME one thus has to decide first what to do with the random effects. Two alternatives are possible:

(1) marginal residuals

(2) conditional residuals

Since the random effects are mere random variables, we could integrate them from the model and then compute the residuals implied residuals; those compute this way are called marginal residuals.

On the other hand, random effects are also parameters, albeit random ones. In some contexts, it is of interest also an estimation of the random effects. Thus having an estimate of the random effects, it is possible to consider residuals for the model that are obtained conditionally on these estimates; these are called conditional residuals. For a full account of these issues see Pinheiro and Bates (2004) "Mixed-Effects Models in S and S-PLUS", Springer.

From the point of view of assumption verification (if that's ever useful, see the Side Note), this means that you should never check if the distribution of the response is normal-looking (e.g. by histograms, normality tests, etc.). You should instead look at the distribution of the residuals of that model.

Side Note. Some statisticians would argue that checking the normality of the residuals is not useful at all. You can find many threads on this here on this site, e.g. here

$\endgroup$
9
  • 1
    $\begingroup$ Extracting the residuals of the model and doing either QQ-plot hunting for heavy tails or strong skewness + (eventually) a normality test. But again, I would not worry too much about normality but perhaps about heteroscedasticity; check the link in my post for further details. hope this helps $\endgroup$
    – utobi
    Commented Jan 4, 2023 at 11:55
  • 1
    $\begingroup$ Still a good point, a Gaussian model is homoskedastic, checking for homoskedasticity is crucial, and perhaps a graphical visualization by using the QQ-plot is useful. Thanks a lot. $\endgroup$
    – Maximilian
    Commented Jan 4, 2023 at 11:58
  • 1
    $\begingroup$ see here $\endgroup$
    – utobi
    Commented Jan 4, 2023 at 12:14
  • 1
    $\begingroup$ @Maximilian I agree with you. Form a frequentist perspective it’s not easy to see them as parameters, since they are variable. However, if you see from the eyes of a Bayesian then all become unknown parameters. $\endgroup$
    – utobi
    Commented Jan 5, 2023 at 21:48
  • 1
    $\begingroup$ But even in the frequentist approach it’s not uncommon te see random parameters estimated. this is one example. Another one is factor analysis, in particular, estimation of factor scores. These are also random parameters estimated from data. $\endgroup$
    – utobi
    Commented Jan 5, 2023 at 21:53
3
$\begingroup$

There are some issues when incorporating mixed effects, as you have two sources of residual variation, stemming from your level 1 and level 2 effects. It's been a while since I looked into this, but if I recall correctly, there is a debate going on regarding the appropriateness of different types of residuals. I think, Santos Nobre and da Motta Singer (2007) give a good overview over the challenges in modelling as well as show the most commonly used methods.

If you're working in R, I suggest looking into the HLMdiag package. I remember finding it particularly helpful when diagnosing mixed models. For a Bayesian approach, I looked into DHARMa, which might also be worth checking out if it applies to your use case.

$\endgroup$
3
  • $\begingroup$ That's a good point. Indeed, as pointed out in another comment, we have two distinct sources of variability: the one explained by the random effects and the one explained by the fixed effects. Both the random effect and the residuals are assumed to be normally distributed. Does this imply that should I also check for the normality of the random effects? Thanks for the references, I'm going to consult them $\endgroup$
    – Maximilian
    Commented Jan 4, 2023 at 12:42
  • 1
    $\begingroup$ If I remember correctly, normality of random effects is an assumption for LMMs, so I would check them, too. $\endgroup$
    – David
    Commented Jan 4, 2023 at 13:16
  • $\begingroup$ nice to point out conditional vs marginal LME residuals; I completely forgot about that issue (+1). $\endgroup$
    – utobi
    Commented Jan 5, 2023 at 10:02
1
$\begingroup$

If you have longitudinal data it might be a better idea to plot the response (y) as lines on the time axis (x). Then you can think about what model to use. You might prefer something different from a Gaussian mixed model, such as a GEE. What's the difference? Here

There are also other approaches that might be useful, but I don't have enough information on your problem to tell more.

$\endgroup$
1
  • $\begingroup$ In your opinion, can I fit a Gaussian model and then test for the normality of residuals to see if the conditional density is normally distributed ? $\endgroup$
    – Maximilian
    Commented Jan 4, 2023 at 11:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.