1
$\begingroup$

I am moving from StackOverflow looking for statistical aspects of the code.

My work is based on an lifestyle intervention made up 3 groups (1, 2 and 3) with measurements at 2 points separated by 12 months (pre- and post-intervention). The sample size is around 50 participants/group, and the response variables are genes, just in case.It is proven which covariates have largely affect the outcome in this studies, such as age or sex.

My understanding of linear mixed models is limited, and constrained to their utility of the general aspects.

Referring to package choice, nlme package allows heterocedasticity, and more flexible covariance structures in the random effects compared to lme4. This is way I picked nlme over lme4.

Having decided this I could possibly have:

  1. lme(gene ~ time:group + age + sex, random= ~ time|id , method="REML", data=tmp, control = lmeControl(opt = "optim"), na.action = na.omit)
  2. lme(gene ~ time:group + group + age + sex, random= ~ time|id , method="REML", data=tmp, control = lmeControl(opt = "optim"), na.action = na.omit)
  3. lme(gene ~ time:group + group + time + age + sex, random= ~ time|id , method="REML", data=tmp, control = lmeControl(opt = "optim"), na.action = na.omit)
  4. lme(gene ~ time:group + time + age + sex, random= ~ time|id , method="REML", data=tmp, control = lmeControl(opt = "optim"), na.action = na.omit)

I don't really understand the difference in including the time effect alone or not. If understood it correctly the grup_int alone captures intra-individual variance (through the intercept). Not sure if time alone (model4) means anything or at least anything contributing the model. I checked model nº 2 and 3 don't change at least in the interaction terms.

Thanks in advance

UPDATE

The results obtained by the model2 for the time:grup_int interaction are like this: enter image description here

Does it mean that the interaction term time:grup_int1 is not significant compared to the other 2 groups? How is this possible when time:grup_int2 is significant? I guess this is not the meaning becaus this would be a overall p-value as a simple ANOVA,but I would get a single value. Which is the meaning then?

$\endgroup$
2

1 Answer 1

1
$\begingroup$

lme4 is often a popular choice because it will warn you more when your models don't fit well. In your case, you have only two time points per individual, but you are fitting a model with both a random intercept and a random slope. You don't have enough data to fit a random slope. Your random effect term should instead be random= ~ 1|id. Now you are fitting an interaction term that is only between fixed effects (time:group). The standard rules for interactions apply, you need to include both time and group as separate variables (gene ~ time:group + group + time).

Edit

Here's a worked example.

> library(lme4)
> library(nlme)
> data(sleepstudy)
> 
> ## filter data so only 2 observations per participant
> head(sleepstudy[sleepstudy$Days<2,])
   Reaction Days Subject
1  249.5600    0     308
2  258.7047    1     308
11 222.7339    0     309
12 205.2658    1     309
21 199.0539    0     310
22 194.3322    1     310
> 
> ## nlme doesn't throw an error
> lme(fixed = Reaction ~ 1, 
+          random= ~ Days|Subject , 
+          method="REML", 
+          data=sleepstudy[sleepstudy$Days<2,], 
+          control = lmeControl(opt = "optim"), 
+          na.action = na.omit)
Linear mixed-effects model fit by REML
  Data: sleepstudy[sleepstudy$Days < 2, ] 
  Log-restricted-likelihood: -166.738
  Fixed: Reaction ~ 1 
(Intercept) 
   259.9841 

Random effects:
 Formula: ~Days | Subject
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev   Corr  
(Intercept) 30.23539 (Intr)
Days        18.71391 -0.23 
Residual    11.11483       

Number of Observations: 36
Number of Groups: 18 
> 
> ## lme4 correctly throws an error as the model is not identifiable
> lmer(Reaction ~ 1 + Days|Subject,
+           data = sleepstudy[sleepstudy$Days<2,])
Error: number of observations (=36) <= number of random effects (=36) for term (1 + Days | Subject); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable
$\endgroup$
7
  • $\begingroup$ So do you think I shouldn't include a random slope because of my sample size? As for the random intercept I guess it is necessary to account for variability between-individuals, but you don't think I need a random slope? Is this because I only have 2 measurements, and the effect of time won't have a predictor weight? $\endgroup$ Commented Feb 2, 2023 at 9:10
  • 1
    $\begingroup$ Your simply don't have the number of time points for a random slope. Your model doesn't fit the data well (even if nlme isn't throwing any warnings). $\endgroup$
    – David B
    Commented Feb 2, 2023 at 11:25
  • $\begingroup$ I am not sure if I get this correctly, but is it not necessary just 2 points to construct slope? $\endgroup$ Commented Feb 2, 2023 at 11:57
  • $\begingroup$ I've edited my answer to show an example. nlme will run the model (and give a result!) even though it the model is not identifiable. $\endgroup$
    – David B
    Commented Feb 2, 2023 at 14:53
  • $\begingroup$ @JavierHernando If you allow for a random slope, as a line can be fitted perfectly by two points, separate slopes for all individuals can fit all data perfectly in principle (this may not be exactly true because of the distributional assumption for the slopes, but it goes in this direction), which doesn't then allow you to estimate variation or any overall time effect. $\endgroup$ Commented Feb 2, 2023 at 15:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.