42
$\begingroup$

Very skewed distributions such as the log-normal do not result in accurate bootstrap confidence intervals. Here is an example showing that the left and right tail areas are far from the ideal 0.025 no matter which bootstrap method you try in R:

require(boot)
n    <- 25
B    <- 1000
nsim <- 1000
set.seed(1)
which <- c('basic', 'perc', 'norm', 'bca', 'stud')
mul <- 0; sdl <- 1.65   # on log scale
dist <- c('normal', 'lognormal')[2]
switch(dist, normal    = {g <- function(x) x; mu <- mul},
             lognormal = {g <- exp; mu <- exp(mul + sdl * sdl / 2)})
count <- matrix(0, nrow=length(which), ncol=2,
                dimnames=list(which, c('lower', 'upper')))
stat <- function(x, j) {
## See http://www.psychology.mcmaster.ca/bennett/boot09/percentileT.pdf
  x <- x[j]
  m <- mean(x)
  s <- sd(x)
  n <- length(x)
  sem <- s / sqrt(n)
  m.var <- sem ^ 2
  c(m, m.var)
}
for(i in 1 : nsim) {
  if(i %% 100 == 0) cat(i, '')
  x <- g(rnorm(n, mul, sdl))
  b  <- boot(x, stat, R=B)
  ci <- boot.ci(b, type=which)
  for(w in which) {
    nam <- switch(w, perc='percent', norm='normal', basic='basic',
                  stud='student', bca='bca')
    z <- rev(rev(ci[[nam]])[1:2])
    count[w, 'lower'] <- count[w, 'lower'] + (z[1] > mu)
    count[w, 'upper'] <- count[w, 'upper'] + (z[2] < mu)
  }
}
cat('\n')
count / nsim

The result is below:

      lower upper
basic 0.000 0.329
perc  0.003 0.257
norm  0.000 0.287
bca   0.015 0.185
stud  0.005 0.129

For $n=400$ single bootstraps still do not provide adequately accurate coverage:

      lower upper
basic 0.001 0.114
perc  0.005 0.093
norm  0.002 0.102
bca   0.017 0.067
stud  0.011 0.058

Empirical likelihood also fails to provide accurate confidence intervals when sampling from the lognormal distribution.

Is there a general-purpose approach out there that does not depend on knowing the distribution in advance? Has anyone tried to get confidence intervals for the mean by fitting the data to the Tukey generalized $\lambda$ distribution (this distribution is highly flexible)? What about using Kolmogorov-Smirnov confidence bands for the CDF? Would computing the mean on the upper and lower bounds on the CDF be horribly conservative? I would settle for some conservatism if a method has wide applicability.

To restate the goals, I'm seeking a generally applicable approach for getting a confidence interval for a population mean such that

  1. the interval is asymmetric if the raw data distribution is asymmetric
  2. the interval has correct coverage in both tails (e.g., 0.025 error probability in both)
  3. the procedure does not require the analyst to specify anything about the underlying distribution or the transformation needed to make the distribution symmetric

Note that the central limit theorem is irrelevant here; I have a fixed small sample size and the confidence interval must be asymmetric to be accurate in both tails. The parametric $t$-based confidence interval under a lognormal model with $\mu=0, \sigma=1.65$ and $n=20000$ still has bad coverage (left tail error 0.012, right 0.047 when both should be 0.025).

In continuing to think about this there are two broad ways of conceptualizing the problem that I'd like to discuss.

  1. The mean is not a quantity that lends itself to nonparametric inference, at least when exactness of inference is required. The sample median is meaningful for any continuous distribution and we have a simple exact confidence interval for the median. In a sample of size $n=20$ from a normal distribution the confidence interval for the median is $1.28 \times$ longer than the exact $t$-based confidence interval for the mean (see code below). Perhaps this factor of 1.28 is a reasonable price to pay for robustness and complete distributional freedom.
  2. Even though no single bootstrap will give adequately accurate confidence limits for samples from extremely skewed distributions, the double bootstrap can significantly improve the confidence coverage in both tails. Nankervis has some nice results and provides an excellent computational algorithm. But no software I could find implements this.

R code illustrating 1. above:

## Exact CI for median from DescTools package SignTest.default
## See also ttp://www.stat.umn.edu/geyer/old03/5102/notes/rank.pdf,
## http://de.scribd.com/doc/75941305/Confidence-Interval-for-Median-Based-on-Sign-Test
cimed <- function(x, alpha=0.05, na.rm=FALSE) {
  if(na.rm) x <- x[! is.na(x)]
  n <- length(x)
  k <- qbinom(p=alpha / 2, size=n, prob=0.5, lower.tail=TRUE)
  ## Actual CL: 1 - 2 * pbinom(k - 1, size=n, prob=0.5) >= 1 - alpha
  sort(x)[c(k, n - k + 1)]
}

n <- 20
m <- 20000
cil <- cilt <- 0
z <- qt(0.975, n - 1)

for(i in 1 : m) {
  x <- rnorm(n)
  cil  <- cil + diff(cimed(x))
  cilt <- cilt + 2 * z * sqrt(var(x) / n)
}
cil  <- cil / m
cilt <- cilt / m

c(cil, cilt, cilt / cil, cil / cilt)
$\endgroup$
4
  • 1
    $\begingroup$ This is computationally intensive, but what if you took the empirical cdf, started randomly generating Brownian bridges; each Brownian bridge represents the delta between the ecdf and some hypothetical cdf. Compute the mean using the hypothetical cdf and weight it by the factor prescribed by KS test. Repeating this for a while, you'll have a weighted data set of means and can compute the confidence interval. $\endgroup$ Commented Dec 24, 2015 at 15:21
  • $\begingroup$ I don't have a hypothetical cdf. And what would happen if you just use the upper and lower 0.95 confidence region from KS and computed the mean from them, i.e., would this be horribly conservative. $\endgroup$ Commented Dec 24, 2015 at 15:25
  • $\begingroup$ The hypothetical cdf is introduced by adding a randomly generated Brownian bridge to the empirical cdf. Also, I'm not suggesting taking the mean from the confidence region. I'm suggesting getting many means by generating many hypothetical distributions, appropriately weighted, and then getting the confidence interval. It's basically just a different approach to bootstrapping, I think the outcome could be different though. $\endgroup$ Commented Dec 24, 2015 at 15:33
  • $\begingroup$ It would be interesting to see how efficiently it could be programmed and how accurate the confidence interval coverage is. Thanks for the suggestion. I wonder if the Bayesian bootstrap would mimic that. I've tried the Bayesian bootstrap in another context and it did not improve confidence interval coverage. $\endgroup$ Commented Dec 24, 2015 at 15:53

2 Answers 2

16
$\begingroup$

I am somewhat pessimistic about a such non-parametric method, at least without the introduction of some sort of constraints on the underlying distribution.

My reasoning for this is that there will always be a distribution that breaks the true coverage probability for any finite $n$ (although as $n \rightarrow \infty$, this distribution will become more and more pathological), or the confidence interval will have to be arbitrarily large.

To illustrate, you could imagine a distribution that looks like a normal up to some value $\alpha$, but after $\alpha$ becomes extremely right skewed. This can have unbounded influence on the distribution's mean and as you push $\alpha$ out as far as possible, this can have arbitrarily small probability of making it into your sample. So you can imagine that for any $n$, you could pick an $\alpha$ to be so large that all points in your sample have extremely high probability of looking like it comes from a normal distribution with mean = 0, sd = 1, but you can also have any true mean.

So if you're looking for proper asymptotic coverage, of course this can be achieved by the CLT. However, your question implies that you are (quite reasonably) interested in the finite coverage. As my example shows, there will always be a pathological case that ruins any finite length CI.

Now, you still could have a non-parametric CI that achieves good finite coverage by adding constraints to your distribution. For example, the log-concave constraint is a non-parametric constraint. However, it seems inadequate for your problem, as log-normal is not log-concave.

Perhaps to help illustrate how difficult your problem could be, I've done unpublished work on a different constraint: inverse convex (if you click on my profile, I have a link to a personal page that has a preprint). This constraint includes most, but not all log-normals. You can also see that for this constraint, the tails can be "arbitrarily heavy", i.e. for any inverse convex distribution up to some $\alpha$, you can have heavy enough tails that the mean will be as large as you like.

$\endgroup$
2
  • 2
    $\begingroup$ Excellent thoughts. I hesitate to require those kinds of constraints because I see bimodal distributions and other complexities often enough. $\endgroup$ Commented Dec 24, 2015 at 17:14
  • 1
    $\begingroup$ @FrankHarrell: there has been work done with mixture models with non-parametric log-concave components. However, I cannot imagine at this time that there's good methods for creating a confidence for the overall mean, especially if the number of components are not known in advance. $\endgroup$
    – Cliff AB
    Commented Dec 24, 2015 at 23:39
3
$\begingroup$

One of the underlying assumptions of any sample is representativeness. The longer the tails of a distribution the less likely any small sample is going to be representative enough for any method to reliably solve for the CI because the sample won't be able to represent the distribution.

For example, running a simple perc CI on an exponential distribution with a sample size of 250 yields pretty ok results. They are much better than a with a sample of 25, although still not ideal.

I agree with Cliff AB that there won't be a general solution but you don't have to hypothesize extreme distributions. There won't be anything that works broadly with small samples. And in some cases the samples might have to be very large (but it would be nice to be wrong).

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.