2
$\begingroup$

I'm designing a psychology experiment in which we're interested in how working memory load affects performance in a theory-of-mind task. Independent variable is working memory load, dependent variable is theory-of-mind task performance.

In our within-subjects design, we have 2 working memory conditions (high load and low load). We have 2 stimulus sets (one for each working memory condition, counterbalanced across participants) to prevent carry-over effects between working memory conditions.

The problem is that performance in our task is driven not only by working memory condition, but also by stimulus set to a large extent. My colleague believes this can be dealt with using mixed models. However, I have some doubts about taking this approach, and would rather take the time to adjust the stimulus sets so that performance within each subject is as comparable as possible between both stimulus sets (under the same working memory condition).

If I were to take my colleague's advice, would there be any issues for example with power? And are there resources that you could point me to so that I can understand this issue better?

Edit: There were some questions about sample size and analysis setup. I'm aiming for around 40 participants, but have only just started collecting data.

To clarify, I'm not against using mixed models per se, but I'm wondering: 1) whether there might be a loss in power which is proportionate to the imbalance in effects between the two stimulus types, 2) whether I may need to take steps to compensate for this loss in power, and 3) if so, what steps to take (e.g., is there a way to calculate, roughly, how many more participants I would need given a certain effect size for stimulus type? Or is there perhaps a smarter way to set up the analyses?).

The analysis setup is to perform a linear mixed model, with dependent variable as proportion of correct responses, with fixed effects as working memory load (high vs. low), stimulus type (set 1 or 2), and their interaction, and with mixed effects as intercepts for subjects and items. Then we'd compare this fit with fit of a reduced model without these fixed effects.

I hope this clarifies my question, and appreciate help answering the 3 questions stated in my edit.

$\endgroup$
4
  • $\begingroup$ This sounds similar to something I struggled with many years ago, and have never fully resolved: psychology.stackexchange.com/q/3722/21 I ended up making up my own stimulus set, prioritizing ecological validity, at the cost of power. sciencedirect.com/science/article/pii/… There is a tradeoff between power and ecological validity of the stimulus set, if you ask me. $\endgroup$
    – Steven Jeuris
    Commented May 27, 2020 at 13:57
  • 2
    $\begingroup$ You would have to provide more information on how your analyses are set up, but I don't see how you can avoid a repeated measures statistical analysis here without committing some statistical faux pas; mixed models are one flexible way to do that. You don't gain real power by using a flawed statistical design, just fake power at the expense of inflated type I errors. $\endgroup$
    – Bryan Krause
    Commented May 27, 2020 at 14:35
  • $\begingroup$ I agree with Bryan, more information is needed to answer this Q. Number of participants, other variables that can be thrown into the LMM etc etc $\endgroup$
    – AliceD
    Commented May 27, 2020 at 15:14
  • $\begingroup$ I've now edited my post to answer questions about sample size and analysis setup. $\endgroup$ Commented May 28, 2020 at 18:16

1 Answer 1

2
$\begingroup$

Could you maybe check in simulation? I'm a huge fan of the workflow described here: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

You are asking question #3 of 4, the "inferential adequacy" one, right? Can you run some simulations with varying 'true stimulus-set gap' and just see how big a gap you can tolerate with your design?

You don't need to know the true data generating process to do this, generating responses from the model tells you something useful about the model+design setup, in this workflow reality has to wait until question #4. Also an important question, but checking against reality comes after passing the tests you're talking about.

Although you don't need to know how people actually come up with their responses, you do need to be Bayesian enough to be comfortable putting priors on all the parameters. If it's important to convince a colleague with the results, might be worth checking that's not a dealbreaker for them!

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.