9
$\begingroup$

It is quite simple to simulate linear models:

set.seed(42)    
years <- rnorm(100, 12, 8)
work_hours <- rnorm(100, 8, 2)
income <- 2*years + 0.5*work_hours + 2000 + rnorm(100, 0, 10)
plot(work_hours, income2)
lmmodel <- lm(income ~ years + work_hours)
summary(lmmodel)

Or logistic models:

set.seed(42)
x1 <-  rnorm(100) 
x2 <-  rnorm(100)
z <- 1 + 2*x1 + 3*x2   
pr <- 1/(1+exp(-z)) 

y = rbinom(100,1,pr) 

df <- data.frame(y=y,x1=x1,x2=x2)
logitmodel <- glm( y~x1+x2,data=df,family="binomial")
summary(logitmodel)

So, how does one simulate random effects models? I mean, there are lots of "flavors" with this class of models. Looking at Faraway's [book][1] there are:

  • Blocks as Random Effects
  • Split Plots
  • Nested Effects
  • Crossed Effects
  • Multilevel Models
  • Repeated Measures
  • Longitudinal/Panel Data
  • Mixed effect models for nonnormal Responses

How would I simulate them so I can toy with them?

[1]: Extending the Linear Model with R - John Faraway

$\endgroup$

2 Answers 2

7
$\begingroup$

Here is how I simulate random effects. I'll demonstrate for linear regression, but extending it to a different GLM should be straight forward.

Let's start with a random intercept model. The model is usually written as

$$ y = XB + Z\gamma $$

Where $Z$ is an indicator for the group and $\gamma_i$ is normally distributed with mean 0 and some variance. Simulation of this model is as follows...

groups <- 1:5
N <- 250
g <- factor(sample(groups, replace = TRUE, size = N), levels = groups)
x <- rnorm(N)
X <- model.matrix(~ x)
Z <- model.matrix(~ g - 1)

beta <- c(10, 2)
gamma <- rnorm(length(groups), 0, 0.25)

y = X %*% beta + Z%*% gamma + rnorm(N, 0, 0.3)

Let's fit a mixed model and see if we recover some of these estimates


library(lme4)

model = lmer(y ~ x + (1|g), data = d)

summary(model)
linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | g)
   Data: d

REML criterion at convergence: 136.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.85114 -0.65429 -0.00888  0.65268  2.63459 

Random effects:
 Groups   Name        Variance Std.Dev.
 g        (Intercept) 0.05771  0.2402  
 Residual             0.09173  0.3029  
Number of obs: 250, groups:  g, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept)  9.95696    0.10914   91.23
x            2.00198    0.01993  100.45

Correlation of Fixed Effects:
  (Intr)
x -0.008

Fixed effects look good, and the group standard deviation (0.25) is estimated pretty accurately, and so is the residual standard deviation.

Random slope models are similar. Under the assumption each slope comes from a normal distribution, then we can write the slope as

$$ y = Bx + \beta_i x$$

Here $B$ is the population mean and $\beta_i$ is the effect of group i. Here is a simulation

library(tidyverse)

groups <- 1:5
N <- 250
g <- sample(groups, replace = TRUE, size = N)
x <- rnorm(N)
X <- model.matrix(~ x)

B <- c(10, 2)
beta <- rnorm(length(groups), 0, 2)

y = X %*% B + x*beta[g] + rnorm(N, 0, 0.3)

and a model ...

library(lme4)

d = tibble(y, x, g)

model = lmer(y ~ x + (x|g), data = d)

summary(model)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (x | g)
   Data: d

REML criterion at convergence: 158.9

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.95141 -0.65904  0.02218  0.61932  2.66614 

Random effects:
 Groups   Name        Variance  Std.Dev. Corr
 g        (Intercept) 2.021e-05 0.004496     
          x           3.416e+00 1.848314 1.00
 Residual             9.416e-02 0.306856     
Number of obs: 250, groups:  g, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept) 10.00883    0.01984  504.47
x            2.05913    0.82682    2.49

Correlation of Fixed Effects:
  (Intr)
x 0.099

Here are the coefficients of the 5 groups

coef(model)
$g
  (Intercept)         x
1    10.00135 -1.015180
2    10.01335  3.919787
3    10.00934  2.270760
4    10.01081  2.873636
5    10.00928  2.246626

and compare them to the true values

B[2] + beta

-0.9406479  3.9195119  2.2976457  2.8536623  2.3539863

$\endgroup$
6
$\begingroup$

Just write down an (algebraic) formula for the model, and simulate from that description. I will give a very simple example, a model with multiple observations of the same subjects, with an exchangeable covariance structure. Such a structure can be represented with a random intercept for each subject. Also an subject-level covariate: $$ y_{ij}=\mu + \alpha x_i + \epsilon_i + \epsilon_{ij} $$ for $i=1,2,\dotsc,n$ and $j=1,\dotsc,k$ within each subject. So this is a balanced model. The same principle is used for unbalanced situations, but that gives more programming. Then we must specify values for fixed parameters and distributions for random effects $\epsilon_i, \epsilon_{ij}$. Some simple R code is:

N <- 20 # Number subjects
k <- 4  # Number obs within subject
set.seed(7*11*13) # My public seed

id <- as.factor(1:N)
x <-  runif(N, 1, 5)
idran <- rnorm(N, 0, 1)
obsran <- rnorm(N*k, 0, 2)
mu <- 10.
alpha <- 1.

X <- rep(x, each=k)
Y <- mu + alpha*X + rep(idran, each=k) + obsran

A plot of this simulated data is:

Line plot of simulated data

For more complex situations it would help with some preprogrammed package, there is a package simstudy on CRAN which can help. See also Model Matrices for Mixed Effects Models and https://stackoverflow.com/questions/30896540/extract-raw-model-matrix-of-random-effects-from-lmer-objects-lme4-r, https://stackoverflow.com/questions/55199251/how-to-create-a-simulation-of-a-small-data-set-in-r.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.