Zhou & Dinh (2005) propose the following formula for a $(1 - \alpha)100\%$ confidence interval for the mean of one sample. The confidence interval is based on a transformation of the $t$-statistic:
where $n$ is the sample size, $\bar{X}$ is the sample mean, $S$ is the standard deviation and $\hat{\gamma}$ is an estimate of the population skewness.
Here is a numerical example: I drew 20 samples from an $\mathrm{N}(100, 225)$ distribution with $\bar{x} = 101.06, s = 17.08, \bar{\gamma} = 0.955$. Further, setting $\alpha = 0.05$ we have: $\zeta_{0.975}=\Phi(0.975) = 0.8352$ and $\zeta_{0.025} = \Phi(0.025) = 0.51$. The $95\%$ confidence interval evaluates to $(0.984;\;99.416)$.
I tried to implement these formulas but stumbled upon a problem: The coverage of these confidence intervals is nowhere near the nominal value.
Where is the mistake?
I find the usage of the standard normal CDF instead of the respective quantile function a bit suspect. When I changed $\zeta_{\alpha} = \Phi^{-1}(\alpha)$, the coverage is near the nominal value. The disadvantage is that the formulas potentially break down with small sample sizes because the argument in $T_{3}^{-1}$ below the cube root gets negative. Using the numbers from the example above, we have: $\zeta_{0.975}=\Phi^{-1}(0.975) = 1.96$ and $\zeta_{0.025} = \Phi^{-1}(0.025) = -1.96$. Thus, $T_{3}^{-1}(20^{-1/2}\times -1.96 = -0.438)=\left\{1+3\left(-0.438 - 20^{-1}\frac{1}{6}0.955\right)\right\}^{1/3} - 1=(-0.3387)^{1/3} - 1$
I conducted a small simulation study to check the coverage. At the normal, it's only around 9% instead of the nominal 95%. Here is the R
code:
# The functions
skew <- function(x) {
n <- length(x)
n/((n - 1)*(n - 2)) * sum(((x - mean(x))/(sd(x)))^3)
}
T3 <- function(x, n, gamma) {
(1 + 3*(x - n^(-1)*(1/6)*gamma))^(1/3) - 1
}
ci_np <- function(x, alpha) {
m <- mean(x)
s <- sd(x)
n <- length(x)
z <- pnorm(c(1 - (alpha/2), alpha/2))
gamma <- skew(x)
Tvals <- c(T3(n^(-1/2)*z[1], n, gamma), T3(n^(-1/2)*z[2], n, gamma))
m - Tvals*s
}
# The simulations
set.seed(142857)
n_sim <- 10000
res <- replicate(n_sim, {
x <- rnorm(100, 100, 15)
tmp <- ci_np(x, 0.05)
ifelse((tmp[1] <= 100) && (tmp[2] >= 100), 1, 0)
})
mean(res)
[1] 0.0869
pnorm
instead ofqnorm
. The more fundamental error is not performing unit testing of your functions: before you see whether a simulation gives expected results, you want to be sure that the helper functionsskew
andT3
are calculating correctly and then you want to checkci_np
against any published results. $\endgroup$pnorm
if I'm not mistaken: $\zeta_{\alpha}=\Phi(\alpha)$, right? As I wrote: Coverage is indeed near nomial when usingqnorm
. $\endgroup$