In order to choose between a machine learning model when the number of features is 5 and a machine learning model when the number of features is 6, I want to bootstrap the auc of the model to obtain a confidence interval and compare whether there is a difference.
When performing a t-test, should I test the bootstrapped results for normality and equal variance? I know that bootstrapping assumes normality. So then there is no need for any special testing?
I would like to compare using the following code in Python.
import pingouin as pg
import numpy as np, statsmodels.stats.api as sms
cm = sms.CompareMeans(sms.DescrStatsW(t_5), sms.DescrStatsW(t_6))
print(cm.tconfint_diff(usevar='unequal'))
't_5' and 't_6' are the results of bootstrapping the auc of the model when 'feature = 5' and the model when 'feature = 6' 2000 times.