9
$\begingroup$

I'm working on a project where I need to simulate data for a Structural Equation Model (SEM) that includes a moderation effect. Specifically, I have three latent variables: an independent variable (IV), a moderator (MOD), and a dependent variable (DV). I need to model the interaction between IV and MOD and assess its effect on DV.

I plan to use the lavaan package in R for fitting the model but need guidance on how to properly generate the data including the moderation effect. I have some familiarity with using the simsem package for data simulation but am unsure of the best approach to correctly specify the model and generate the interaction terms.

I added a sample model. simulateData creates new variable like iva1:mod1 which is not the product of iva1 and mod1:

model_test <- "
  # Measurement model for independent variable A (IVA)
  iva =~ iva1 + iva2 + iva3
  
  # Measurement model for independent variable B (IVB)
  ivb =~ ivb1 + ivb2 + ivb3
  
  # Measurement model for the moderator (MOD)
  mod =~ mod1 + mod2 + mod3
  
  # Interaction terms between IVA and MOD
  mod_iva =~ iva1:mod1 + iva2:mod1 + iva3:mod1 + 
             iva1:mod2 + iva2:mod2 + iva3:mod2 + 
             iva1:mod3 + iva2:mod3 + iva3:mod3
  
  # Interaction terms between IVB and MOD
  mod_ivb =~ ivb1:mod1 + ivb2:mod1 + ivb3:mod1 + 
             ivb1:mod2 + ivb2:mod2 + ivb3:mod2 + 
             ivb1:mod3 + ivb2:mod3 + ivb3:mod3
  
  # Measurement model for dependent variable (DV)
  dv =~ dv1 + dv2 + dv3 + dv4
  
  # Structural model: regression equations
  dv ~ 0.5*iva + 0.4*ivb + -0.3*mod + -0.12*mod_iva + -0.17*mod_ivb
  
  # Covariance between IVA and IVB
  iva ~~ 0.54*ivb
"

result = simulateData(model_test, sample.nobs=300L)
```
$\endgroup$
2
  • $\begingroup$ How will you analyze the data? There are different approaches within lavaan. $\endgroup$ Commented May 29 at 21:16
  • $\begingroup$ @JeremyMiles I added a sample model $\endgroup$
    – Iman
    Commented May 29 at 21:38

2 Answers 2

6
$\begingroup$

It clearly depends on what kind of interaction you are looking for. How are the variables distributed? Are they continuous or categorical? Etc.

That being said, if you want to use structural equation modeling it is safe to say that your envisioned data generation mechanism can be described using a directed acyclic graph of some sort. You can simulate arbitrarily complex data from such graphs using the simDAG R package. Full disclosure, I am the developer of that package. An alternative would be the simCausal R package.

$\endgroup$
4
$\begingroup$

The simulateData() function just calls MASS::mvrnorm() with the population model's implied covariance matrix. That treats each product term as a separate variable (which is nonsense) rather than a function of other variables. When you simulate any nonlinearity, you have to simulate from the data model, one component at a time (starting with the set of exogenous variables. @Denzo mentioned simDAG and simCausal packages. If you'd like to stick with the lavaan ecosystem, I posted an example over 10 years ago on the lavaan forum:

https://groups.google.com/g/lavaan/c/PxFUKcIwPd0/m/-T67JNWL4doJ

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.