I have a dataset where multiple doctors assessed a score for three experimental conditions, they also saw each patient from three different angles. I am now trying to predict the assessment by the doctors.
As I now have multiple non-independent data points for each patient I am trying to fit a mixed model for my data.
Each doctor has a bias which affects the assessment. Each patient is different which affects the assessment, each condition is slightly different which also affects the assessment. Therefore should take all three as random effects in my model?
model1 <- lmer(formula = assessment ~ LIST_PREDICTORS +
(1|participant) + (1|patient) + (1|condition), data =
mixedModel_df)
I am mainly interested in predicting the assessment for new cases so I want to control for the above random effects.
I now have around 20 predictor variables. Many of them are heavily correlated with each other often Pearson r > 0.9. Looking at Variance Inflation Factors confirms heavy multicollinearity problems in my models.
library(car)
vif(model1)
How should I best reduce the number of predictors for my data? How can I find the best model for my data?
I looked into lass regression methods but found lasso adaptations only for generalized linear mixed models and not for linear mixed models with ratio-scaled dependent variables like in my case.
step(model1)
does not produce convincing results. Also from manually trying to eliminate predictors it seems like there are multiple equally plausible predictor combinations.
I also tried to run hierarchical clustering with my predictors, then using the variables which had the strongest correlation with the dependent variable (assessment) for my mixed model. I am also thinking about doing feature selection with a lasso regression model then plugging in the identified variables in a mixed model to properly control for the random effects?