0
$\begingroup$

In order to choose between a machine learning model when the number of features is 5 and a machine learning model when the number of features is 6, I want to bootstrap the auc of the model to obtain a confidence interval and compare whether there is a difference.

When performing a t-test, should I test the bootstrapped results for normality and equal variance? I know that bootstrapping assumes normality. So then there is no need for any special testing?

I would like to compare using the following code in Python.

import pingouin as pg
import numpy as np, statsmodels.stats.api as sms
cm = sms.CompareMeans(sms.DescrStatsW(t_5), sms.DescrStatsW(t_6))
print(cm.tconfint_diff(usevar='unequal'))

't_5' and 't_6' are the results of bootstrapping the auc of the model when 'feature = 5' and the model when 'feature = 6' 2000 times.

$\endgroup$
5
  • 1
    $\begingroup$ I don't know about bootstrapping AUC or model selection using AUC, but in general boot strapping from raw data is considered a non-parametric method and so normality isn't an issue. Here's a good diacussion Why would I want to bootstrap when computing an independent sample t-test? (how to justify, interpret, and report a bootstrapped t-test) Also in general t test and other linear models are fairly robust to violations of the normality assumption. Finally, I usually think of bootstrapping confidence intervals. You could calculate 95%CI around your AUC and see if they overlap $\endgroup$
    – N Brouwer
    Commented Apr 7 at 15:29
  • $\begingroup$ @N Brouwer you're right. If the confidence intervals overlap, use a t-test to check whether the t-test result includes 0. $\endgroup$
    – JAE
    Commented Apr 7 at 15:36
  • $\begingroup$ Usually you just calculate the bootstrap and go with that. I've never heard of following up a bootstrap with a t test. My first thought would be to 1) create bootstrap samples. 2) fit the two models to each sample and determine auc 3)calculate difference between models for each auc (delataAUC) 4) calculate 95% confidence intervals around the deltaAUC. If the CI includes 0 then they are "not different" as per how CIs are interpreted. This procedure would be similar to a paired t test $\endgroup$
    – N Brouwer
    Commented Apr 7 at 17:23
  • $\begingroup$ @N Brouwer What is delata auc that you mentioned? $\endgroup$
    – JAE
    Commented Apr 7 at 19:01
  • $\begingroup$ Sorry, typo. DeltaAUC, which I'm defining as (AUC from model 1) - (AUC from model 2) $\endgroup$
    – N Brouwer
    Commented Apr 7 at 20:24

0