0
$\begingroup$

My goal is to find the best 1 model out of 55 classification models.

I first ran nested cv on 55 models to see which model had better generalization. The AUC score was used as an evaluation indicator.

However, the difference in test scores of several models is not large, around 0.001 or 0.002, and when all of these models perform nested CV, the difference between the train score and test score is only 1 to 3%.

So, I feel that it is difficult to select only one best model using the nested cv method alone, so I plan to use bootstrap rather than cv to compare whether there is a difference in auc confidence interval between models with similar performance and select the model with a larger confidence interval. Since we did not do CV, the data was divided into train:valid:test 6:2:2. And I plan to compare the auc confidence interval using t-test or anova test.

However, I don't know if this is an appropriate way to select a model.

$\endgroup$

0