0
$\begingroup$

I have constructed a nonparametric bootstrap confidence interval using 1000 iterations. However, I got a result of CI: 0.72 [0.63, 0.68]. As you can see, the point estimate is above the upper limit of 95% confidence interval. Now, I have two questions.

  1. What are the possible underlying reasons for this?
  2. How to interpret and report such results?

Any help is highly appreciated. Thank you!

$\endgroup$
3
  • 2
    $\begingroup$ What library and methods did you use to compute them? $\endgroup$ Commented Jan 23, 2022 at 21:30
  • $\begingroup$ As my data is hierarchical, I used resample_data() from fabricatr package. $\endgroup$
    – iGada
    Commented Jan 23, 2022 at 21:34
  • 8
    $\begingroup$ debug your code $\endgroup$
    – Aksakal
    Commented Jan 23, 2022 at 21:55

2 Answers 2

2
$\begingroup$

Two likely possibilities:

  1. Your code is wrong. Double check everything!
  2. You have a lot of data, and one ridiculously large outlier that was not sampled in 95% of the 1000 bootstrap repetitions, so didn't affect the 95% CI.
$\endgroup$
1
  • 1
    $\begingroup$ Actually, my intuition was wrong here - it's actually incredibly unlikely to be case #2! $\endgroup$
    – Eoin
    Commented Jan 24, 2022 at 18:06
2
$\begingroup$

There are very many styles of nonparametric bootstrap confidence intervals. I have used several of them, and I haven't seen a reasonable method for a 95% bootstrap CI for a population mean that failed to contain the sample mean. [However, @whuber suggests that a bootstrap CI may not cover the sample mean, if it is based on a small sample from a highly skewed distribution, such as lognormal. Also, @Gada has given a reference about bootstrap CIs that don't contain the population mean.]

You have not said what method you are using or said how large a sample you have. So, my only direct comment on your specific interval is to question whether you should have done at least 2000 iterations. I agree with @Aksakal that you should check your implementation of the intended style of CI.

Here are two methods applied to a sample of size $n = 25,$ which is contaminated with three observations from a population with a much larger mean.

set.seed(1234)
x = c(rexp(22, 1/5), rexp(3, 1/100))
a = mean(x); a
[1] 10.9786

The true population mean (which would be unknown in a real-life situation) is $\mu = 16.4,$ so I have an 'unlucky' low sample mean.

boxplot(x, horizontal = T)

enter image description here

My first bootstrap CI uses a deprecated simple quantile method known to give bad results for highly skewed samples. With $2000$ iterations it gives the 95% CI $(5.07, 19.26),$ which includes the sample mean (and the population mean).

set.seed(2022)
q = replicate(2000, mean(sample(x,25,rep=T)))
quantile(q, c(.025,.975))
     2.5%     97.5% 
 5.072897 19.260683 

A simple method, offering some bias protection, gives the interval $(2.19, 16.88),$ which contains the sample mean (and, in spite of bad luck, also the population mean).

set.seed(124)
d = replicate(2000, mean(sample(x,25,rep=T)) - a)
LU = quantile(d, c(.975,.025))
a - LU
     97.5%      2.5% 
  2.194154 16.876843 
$\endgroup$
3
  • $\begingroup$ Re "I can't see how a reasonable method for a 95% bootstrap CI for a population mean could fail to contain the sample mean:" One common instance where that can occur is bootstrapping a highly skewed distribution, such as a Lognormal. Bear in mind that the very first information the bootstrap gives us is an indication of the bias in an estimator. A good bootstrap CI incorporates this bias correction--and that's why it might fail to include the sample mean. $\endgroup$
    – whuber
    Commented Jan 24, 2022 at 15:11
  • 1
    $\begingroup$ @whuber Tnx. As you said bootstrap CI can fail to include the population mean for skewed data and/or for small to medium sample sizes. For example, one can check the paper written by Hesterberg (2015). $\endgroup$
    – iGada
    Commented Jan 24, 2022 at 15:43
  • $\begingroup$ Re the edit: my argument is unrelated to sample size. There are Lognormal distributions for which the mean of even a very large sample (by any standard) is likely to be far below the population mean. $\endgroup$
    – whuber
    Commented Jan 24, 2022 at 16:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.