How should I analyze data that is paired and also repeatedly measured?

Question

I have data collected from the same subjects (n=10) over 11 days, with measurements taken under two conditions each day: immediately after training(D0) and 30 minutes after training(D30). The data is ordinal, scaled from 1 to 10. I want to learn if there is a different between measurements taken immediately after trainings and 30 minutes after trainings. How should I analyze this data, which test should I use?

There's not going to be a statistical test for this. You're going to need a linear mixed model. You can simplify the data by subtracting before versus after (or vice versa) but you still need to account for repeated measurements on individuals — N Brouwer, Commented Apr 18 at 1:31
I would probably use a multilevel ordinal regression with participant-specific random slopes for both delay and day (delay|ID) + (day|ID) in R language. I think this can be done in Stata using the meologit function. In R there was a mixor package for this but it seems to be removed from CRAN atm. — Sointu, Commented Apr 18 at 7:14
I need to compare each day's 2 conditions with each other, not with other days' conditions. So, if I conduct multiple wilcoxon signed rank tests and then adjust the p-value, Would it be wrong? In this case, number of wilcoxon signed rank tests would be 11. — benrilzm, Commented Apr 25 at 12:38

Cryo · Accepted Answer · 2024-04-26 06:12:03Z

1

I don't have a named test for you, but here is how I would try modelling this data.

Let $X^{(i,n,t)}=1\dots 10$ be the results of $i$-th individual on $n$-th day observed either before ($t=0$) or after ($t=1$) training. These can clearly be modeled as multinomial variables, so the question is how to model the probability of each result, for each subject.

Since you do not have that much data you can try going with:

$$ \mbox{logit} P\left[X^{(i,n,t)}\ge k\right]=\left(1-t\right)\cdot a^{(i,n)}+t\cdot b^{(i,n)}-d_{k},\quad d_{k+1}\ge d_k $$

So the log-odds of $X^{(i,n,t)}\ge k$ has a common threshold for each level $k$, but where each subject starts relative to each threshold is specific to the subject and to the day, and where the subject ends after test is also specific to the subject. Your measurable effect can be something like $b^{(i,n)}-a^{(i,n)}$ for each subject. You can pool the effect amongst all individuals by letting $a^{\left(i,n\right)}=a^{\left(n\right)}$ and the same for $b^{(i,n)}$. You can further try to isolate the averaged effect over all days by letting $b^{(n)}=B+a^{(n)}$, i.e. each day the population may start at a different point each day, but the change before-after test ($B$) is the same for all days.

You will probably need to have some special treatment around $k=1$since $P\left[X^{(i,t)}\ge 1\right]=1$ irrespective of the statistical model.

You will need to fit these parameters to the data. In terms of significance, your null hypothesis could be that $a^{(i,n)}=b^{(i,n)}$.

I would probably try modelling it as a Bayesian model with the following priors: $a^{(i,n)}$ are $b^{(i,n)}$ normally distributed and $d_k=f+\sum_{r=1}^k g_r$ where $f$ can be normally distributed, whilst $g_{\dots}$ have to be non-negative (e.g. Gamma distributed).

edited Apr 26 at 6:12

answered Apr 17 at 23:10

Cryo

7833 silver badges10 bronze badges

$\begingroup$ I need to compare each day's 2 conditions with each other, not with other days' conditions. So, if I conduct multiple wilcoxon signed rank tests and then adjust the p-value, Would it be wrong? In this case, number of wilcoxon signed rank tests would be 11. $\endgroup$
– benrilzm
Commented Apr 25 at 12:30
$\begingroup$ @benrilzm, I have added provisions to pool effects across population and making it day-specific. Not sure how you want to do Wilcox test, compute score differences before and after lunch and infer based on differences from the whole population on a day? Is difference between score=1 before training and 2 after training, as important as difference between score=9 before lunch and score=10 after? If not, I am not sure you meet the iid conditions. Also your multiple tests are not independent, since people remain the same, so it is not clear what to do with your standard errors, and significance. $\endgroup$
– Cryo
Commented Apr 26 at 6:09
$\begingroup$ Why not try to model data generation process instead of picking a test? The apparent simplicity of running a test will fall apart as soon as you come to interpret its results, and it will get even more dire if you have to deep-dive and debug. $\endgroup$
– Cryo
Commented Apr 26 at 6:11
$\begingroup$ I thought to do Wilcoxon to compare each day's 2 conditions. 1st day's D0 vs D30, 2nd day's D0 vs D30, etc. I am not interested in how D0s or D30s change in other days or if there is a difference between days, etc. Basically I just need to compare D0 and D30 for each day and see if they are different. That's why I thought multiple wilcoxon with adjusted p-values might be reasonable. Am I Wrong? $\endgroup$
– benrilzm
Commented Apr 26 at 12:13
$\begingroup$ See my question above about score. Even if this is not a concern. How will you adjust the p-values? Some generic adjustment will require additional assumptions that you may not meet. It is not that you are wrong, it is that you don't provide enough details to say either way. What is your model for data generation? How will you adjust p-values? What will those adjusted p-values mean (normally p-value is probability of a statistic under the null hypothesis) etc? $\endgroup$
– Cryo
Commented Apr 26 at 12:22

Add a comment |

Andy · Accepted Answer · 2024-04-18 05:51:13Z

The answer will differ depend on whether or not the subjects are receiving training on the same topic each day. For instance, if the questions is about training retention and the topics differ each day, than you can safely ignore the effect of time (day) in the model. However, if subjects receive training on the same topic each day, than time becomes very relevant (e.g., imagine a Subject that struggles early on but studies the topic daily vs one that never studies).

If the topic is new every day, a mixed-effects model that uses the ordinal score as a response variable, time since training (immediate vs delayed) as a categorical treatment, and subject as a random effect seems reasonable.

If the topic never changes, you would want to include Day as a covariate in the model.

Stack Exchange Network

How should I analyze data that is paired and also repeatedly measured?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
anova
repeated-measures
ordinal-data
wilcoxon-signed-rank
friedman-test
or ask your own question.

Hot Network Questions

How should I analyze data that is paired and also repeatedly measured?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged anovarepeated-measuresordinal-datawilcoxon-signed-rankfriedman-test or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
anova
repeated-measures
ordinal-data
wilcoxon-signed-rank
friedman-test
or ask your own question.