I am new to mixed models so please go easy on me!
I am attempting to longitudinally model a normally distributed continuous outcome with values in my dataset from -30-10. I have data collected at 1, 4 and 12 months post event on 256 individuals. This is a spaghetti plot of the data by age category.
spaghetti plot.png
The mean of the outcome is 1.56 at baseline (pre event), -4.57, -0.24 and -0.42 at 1, 4 and 12 months post event respectively. Both based on the plot and logic (the outcome can't just become less negative linearly throughout time after the event forever onward) a linear relationship between time and this outcome doesn't seem to fit, however I am limited by only 3 post event time points AND an outcome with negative values.
I built a linear mixed model with a fixed time slope, fixed quadratic time slope and random intercept and random linear time including a significant covariance between intercept and slope(random quadratic time was not significant). I tested my time invariant predictors one at a time in the model and included those variables that resulted in a pseudo R^2 > 5% in the final model (the goal of model building was a) test the hypothesis that age category was related to the outcome over time and b) determine if any other variables collected help predict the outcome). The above selection criteria left only age category as a significant predictor of the outcome in the final model. Below there is a table of my model building process.
The parameter estimates make sense given the distribution of the data. I think a piecewise discontinuous model or exponential model for time would fit the data better, however with only 3 time points I can't fit these (model doesn't converge).
So all is well, but then I run the model diagnostics and my residuals demonstrate a non normal (light tailed?) distribution and the residual vs fitted looks like it has a linear drift.
From what I understand this basically indicates that the model isn't well fit and usually means that the parameter estimates are accurate, but the p values and CI would be too narrow. With this in mind I thought about a log transformation of the outcome - but with the negative values I would have to add an arbitrary amount to the outcome (recenter around 100?) which I know is frowned upon. In addition, I think it would be very hard to interpret a model of log transformed outcome with a quadratic time function.
Any advice on next steps (or corrections on the steps taken so far) would be greatly appreciated.
Here is a second table comparing the final model with continuous time with a quadratic and time as a factor. And the qq plot of time as a factor. Using time as a factor doesn't appear to improve model fit (right?).
Here are the residual vs time and age category plots
and this is the categorical time model with baseline differences. Baseline differences are not significant (nor do they change the residuals) when included in the model with continuous time.