I'm designing a psychology experiment in which we're interested in how working memory load affects performance in a theory-of-mind task. Independent variable is working memory load, dependent variable is theory-of-mind task performance.
In our within-subjects design, we have 2 working memory conditions (high load and low load). We have 2 stimulus sets (one for each working memory condition, counterbalanced across participants) to prevent carry-over effects between working memory conditions.
The problem is that performance in our task is driven not only by working memory condition, but also by stimulus set to a large extent. My colleague believes this can be dealt with using mixed models. However, I have some doubts about taking this approach, and would rather take the time to adjust the stimulus sets so that performance within each subject is as comparable as possible between both stimulus sets (under the same working memory condition).
If I were to take my colleague's advice, would there be any issues for example with power? And are there resources that you could point me to so that I can understand this issue better?
Edit: There were some questions about sample size and analysis setup. I'm aiming for around 40 participants, but have only just started collecting data.
To clarify, I'm not against using mixed models per se, but I'm wondering: 1) whether there might be a loss in power which is proportionate to the imbalance in effects between the two stimulus types, 2) whether I may need to take steps to compensate for this loss in power, and 3) if so, what steps to take (e.g., is there a way to calculate, roughly, how many more participants I would need given a certain effect size for stimulus type? Or is there perhaps a smarter way to set up the analyses?).
The analysis setup is to perform a linear mixed model, with dependent variable as proportion of correct responses, with fixed effects as working memory load (high vs. low), stimulus type (set 1 or 2), and their interaction, and with mixed effects as intercepts for subjects and items. Then we'd compare this fit with fit of a reduced model without these fixed effects.
I hope this clarifies my question, and appreciate help answering the 3 questions stated in my edit.