Bootstrap vs Wilson score confidence interval

Question

For estimation of the confidence interval of sensitivity and specificity, when I should use the Wilson score and when I should use bootstrapping?

Sensitivity and specificity are typically estimated as binomial proportions (of positive test results among people confirmed to have disease or of neg test results among people known to be disease free). Wilson CI should be fine, except for proportions very near 100% where true coverage probability may be quirky. Also, there may be philosophical issues near 100% that may make Bayesian interval estimation with an informative prior an attractive alternative. — BruceET, Commented Jun 23, 2020 at 17:41
@BruceET, (1) I completely agree with you but I don't have a reference to what you say. Can you provide me with one, please? (2) why bootstrapping is considered as a worse solution? — Gideon Kogan, Commented Jun 23, 2020 at 19:52
Bootstrapping is a bit unnecessary when you can get e.g. an exact Clopper-Pearson confidence interval (even trivially hand-calculated for any software that supports the Beta-distribution). Bootstrapping also does not work "so well", if you have e.g. an observed sensitivity of specificity of 100 or 0%. — Björn, Commented Jun 23, 2020 at 20:06
There are many bootstrap procedures, so it is not clear which one you'd use. Bootstrapping is approximate, but can be be very useful when distributions are unknown. When the dist'n theory is standard and well-understood as for CIs of binomial proportions it seems traditional methods are preferable. // Finally, bootstrapping binomial procedures has been deprecated as essentially redundant. — BruceET, Commented Jun 23, 2020 at 20:13

kjetil b halvorsen · Accepted Answer · 2021-03-21 13:48:17Z

I think that you should use whichever method gives you the best coverage in the region of interest. For the coverage of bootstrapping techniques there is

Mantalos, P. and Zografos, K., 2008. Interval estimation for a  
binomial proportion: a bootstrap approach. Journal of   
Statistical Computation and Simulation, 78(12), pp.1251-1265.

Here's a Python script that identifies the coverage for techniques including the Wilson interval given x, n and CL. It doesn't include any bootstrapping based non parametric techniques but I think it would be better it it did.

Figure 1, coverage error (CE) for different techniques for Interval estimation for a binomial proportion

import statsmodels.api
from rpy2.robjects.packages import importr
binom = importr('binom')
from rpy2 import robjects
import matplotlib.pyplot as plt

n = 10 # samples
x = 10 # positive results
CL = 0.95 # confidence level
print('confidence level: ',CL)

methods = ["'bayes', type='central'",
           "'wilson'",
           "'agresti-coull'",
           "'exact'",
           "'asymptotic'"]
LW = 10 # line width

# CI
low, high = statsmodels.stats.proportion.proportion_confint(x, n, alpha=1-CL, method='jeffreys')
if x == 0: low  = 0
if x == n: high = 1

# CP
step = (high - low) / 31

robjects.globalenv["LV"] = robjects.r(low)
robjects.globalenv["HV"] = robjects.r(high)
robjects.globalenv["SV"] = robjects.r(step)
robjects.globalenv["CV"] = robjects.r(CL)

CP = {}

for method in methods:
    r_string = """library(binom)
    p = seq(LV,HV,SV)
    coverage = binom.coverage(p, 10, conf.level = CV, method=TECHNIQUE)$coverage
    """.replace('TECHNIQUE',method)
    robjects.r(r_string)
    R_C = list(robjects.r['coverage'])
    CP[str(method)] = R_C

R_P = list(robjects.r['p'])

# Coverage Error (CE) = CP - CL
CE = {}
for method in methods:
    CE[str(method)] = [x - CL for x in CP[str(method)]]

# Dict to Lists
labels, data = [*zip(*CE.items())]

# Plots    
font = {'weight' : 'normal',
        'size'   : 22}
plt.rc('font', **font)

# Violin
fig, ax = plt.subplots()
parts = plt.violinplot(data, showextrema=False, vert=False)

for pc in parts['bodies']:
    pc.set_edgecolor('black')
    pc.set_alpha(1)
    pc.set_linewidth(LW)

plt.xlabel('CE')
fig.set_size_inches(8,4)
#plt.xlim([-.05,.05])
plt.xlim([-CL,1-CL])

# Override the method names
labels=(['Jeffreys equal tailed','Wilson','Agresti-Coull','Clopper-Pearson','Wald'])

plt.yticks(range(1, len(labels) + 1), labels)

plt.grid(b=True, which='major', color='b')
plt.show()

Good question @Gideon. A Confidence Interval (CI) is a range of values for an unknown parameter with a specified, nominal probability that it contains a feature of interest. This probability is otherwise known as the Confidence Level (CL). For example, a 95% CI predicts with 95% CL that p lies within the CI. The probability that the CI contains this feature of interest is called the Coverage Probability (CP). CP would ideally equal CL and the CP is used as a measure of the quality of methods used for the construction of CIs. So, in the case of a 95% CI, by 'best coverage' I mean 95%. — R. Cox, Commented Feb 14, 2021 at 12:02
In the case of a 95% CI, by 'best coverage' I mean that the coverage is clustered the closest to 95%. — R. Cox, Commented Feb 14, 2021 at 12:12
I am familiar with the results that you have demonstrated. Nevertheless, the comments above suggest avoiding bootstrapping and answer my question directly, where your answer demonstrates the comparison between the estimation methods rather than comparing the bootstrap with the closed-form estimation methods. — Gideon Kogan, Commented Feb 14, 2021 at 13:32

Stack Exchange Network

Bootstrap vs Wilson score confidence interval

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
classification
confidence-interval
bootstrap
sensitivity-specificity
or ask your own question.

Linked

Hot Network Questions

Bootstrap vs Wilson score confidence interval

1 Answer 1

Not the answer you're looking for? Browse other questions tagged classificationconfidence-intervalbootstrapsensitivity-specificity or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
classification
confidence-interval
bootstrap
sensitivity-specificity
or ask your own question.