Why bootstrap-based confidence interval didn't include the point estimate?

Question

I have constructed a nonparametric bootstrap confidence interval using 1000 iterations. However, I got a result of CI: 0.72 [0.63, 0.68]. As you can see, the point estimate is above the upper limit of 95% confidence interval. Now, I have two questions.

What are the possible underlying reasons for this?
How to interpret and report such results?

Any help is highly appreciated. Thank you!

As my data is hierarchical, I used resample_data() from fabricatr package. — iGada, Commented Jan 23, 2022 at 21:34

Eoin · Accepted Answer · 2022-01-24 15:22:57Z

2

Two likely possibilities:

Your code is wrong. Double check everything!
You have a lot of data, and one ridiculously large outlier that was not sampled in 95% of the 1000 bootstrap repetitions, so didn't affect the 95% CI.

edited Jan 24, 2022 at 15:22

answered Jan 24, 2022 at 15:17

Eoin

9,4751 gold badge23 silver badges45 bronze badges

1

$\begingroup$ Actually, my intuition was wrong here - it's actually incredibly unlikely to be case #2! $\endgroup$
– Eoin
Commented Jan 24, 2022 at 18:06

Add a comment |

BruceET · Accepted Answer · 2022-01-24 16:26:05Z

There are very many styles of nonparametric bootstrap confidence intervals. I have used several of them, and I haven't seen a reasonable method for a 95% bootstrap CI for a population mean that failed to contain the sample mean. [However, @whuber suggests that a bootstrap CI may not cover the sample mean, if it is based on a small sample from a highly skewed distribution, such as lognormal. Also, @Gada has given a reference about bootstrap CIs that don't contain the population mean.]

You have not said what method you are using or said how large a sample you have. So, my only direct comment on your specific interval is to question whether you should have done at least 2000 iterations. I agree with @Aksakal that you should check your implementation of the intended style of CI.

Here are two methods applied to a sample of size $n = 25,$ which is contaminated with three observations from a population with a much larger mean.

set.seed(1234)
x = c(rexp(22, 1/5), rexp(3, 1/100))
a = mean(x); a
[1] 10.9786

The true population mean (which would be unknown in a real-life situation) is $\mu = 16.4,$ so I have an 'unlucky' low sample mean.

boxplot(x, horizontal = T)

My first bootstrap CI uses a deprecated simple quantile method known to give bad results for highly skewed samples. With $2000$ iterations it gives the 95% CI $(5.07, 19.26),$ which includes the sample mean (and the population mean).

set.seed(2022)
q = replicate(2000, mean(sample(x,25,rep=T)))
quantile(q, c(.025,.975))
     2.5%     97.5% 
 5.072897 19.260683

A simple method, offering some bias protection, gives the interval $(2.19, 16.88),$ which contains the sample mean (and, in spite of bad luck, also the population mean).

set.seed(124)
d = replicate(2000, mean(sample(x,25,rep=T)) - a)
LU = quantile(d, c(.975,.025))
a - LU
     97.5%      2.5% 
  2.194154 16.876843

Re "I can't see how a reasonable method for a 95% bootstrap CI for a population mean could fail to contain the sample mean:" One common instance where that can occur is bootstrapping a highly skewed distribution, such as a Lognormal. Bear in mind that the very first information the bootstrap gives us is an indication of the bias in an estimator. A good bootstrap CI incorporates this bias correction--and that's why it might fail to include the sample mean. — whuber, Commented Jan 24, 2022 at 15:11
@whuber Tnx. As you said bootstrap CI can fail to include the population mean for skewed data and/or for small to medium sample sizes. For example, one can check the paper written by Hesterberg (2015). — iGada, Commented Jan 24, 2022 at 15:43
Re the edit: my argument is unrelated to sample size. There are Lognormal distributions for which the mean of even a very large sample (by any standard) is likely to be far below the population mean. — whuber, Commented Jan 24, 2022 at 16:57

Stack Exchange Network

Why bootstrap-based confidence interval didn't include the point estimate?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
confidence-interval
nonparametric
bootstrap
or ask your own question.

Hot Network Questions

Why bootstrap-based confidence interval didn't include the point estimate?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged confidence-intervalnonparametricbootstrap or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
confidence-interval
nonparametric
bootstrap
or ask your own question.