1
$\begingroup$

Zhou & Dinh (2005) propose the following formula for a $(1 - \alpha)100\%$ confidence interval for the mean of one sample. The confidence interval is based on a transformation of the $t$-statistic:

Zhou1

where $n$ is the sample size, $\bar{X}$ is the sample mean, $S$ is the standard deviation and $\hat{\gamma}$ is an estimate of the population skewness.

Here is a numerical example: I drew 20 samples from an $\mathrm{N}(100, 225)$ distribution with $\bar{x} = 101.06, s = 17.08, \bar{\gamma} = 0.955$. Further, setting $\alpha = 0.05$ we have: $\zeta_{0.975}=\Phi(0.975) = 0.8352$ and $\zeta_{0.025} = \Phi(0.025) = 0.51$. The $95\%$ confidence interval evaluates to $(0.984;\;99.416)$.

I tried to implement these formulas but stumbled upon a problem: The coverage of these confidence intervals is nowhere near the nominal value.

Where is the mistake?

I find the usage of the standard normal CDF instead of the respective quantile function a bit suspect. When I changed $\zeta_{\alpha} = \Phi^{-1}(\alpha)$, the coverage is near the nominal value. The disadvantage is that the formulas potentially break down with small sample sizes because the argument in $T_{3}^{-1}$ below the cube root gets negative. Using the numbers from the example above, we have: $\zeta_{0.975}=\Phi^{-1}(0.975) = 1.96$ and $\zeta_{0.025} = \Phi^{-1}(0.025) = -1.96$. Thus, $T_{3}^{-1}(20^{-1/2}\times -1.96 = -0.438)=\left\{1+3\left(-0.438 - 20^{-1}\frac{1}{6}0.955\right)\right\}^{1/3} - 1=(-0.3387)^{1/3} - 1$


I conducted a small simulation study to check the coverage. At the normal, it's only around 9% instead of the nominal 95%. Here is the R code:

# The functions

skew <- function(x) {
  n <- length(x)
  n/((n - 1)*(n - 2)) * sum(((x - mean(x))/(sd(x)))^3)
}

T3 <- function(x, n, gamma) {
  (1 + 3*(x - n^(-1)*(1/6)*gamma))^(1/3) - 1
}

ci_np <- function(x, alpha) {      
  m <- mean(x)
  s <- sd(x)
  n <- length(x)
  z <- pnorm(c(1 - (alpha/2), alpha/2))
  gamma <- skew(x)      
  Tvals <- c(T3(n^(-1/2)*z[1], n, gamma), T3(n^(-1/2)*z[2], n, gamma))
  
  m - Tvals*s      
}

# The simulations

set.seed(142857)

n_sim <- 10000

res <- replicate(n_sim, {      
  x <- rnorm(100, 100, 15)  
  tmp <- ci_np(x, 0.05)  
  ifelse((tmp[1] <= 100) && (tmp[2] >= 100), 1, 0)      
})

mean(res)
[1] 0.0869
$\endgroup$
4
  • 1
    $\begingroup$ Visual inspection uncovers one obvious error: you have written pnorm instead of qnorm. The more fundamental error is not performing unit testing of your functions: before you see whether a simulation gives expected results, you want to be sure that the helper functions skew and T3 are calculating correctly and then you want to check ci_np against any published results. $\endgroup$
    – whuber
    Commented Mar 29, 2021 at 15:56
  • $\begingroup$ @whuber The paper itself says to use pnorm if I'm not mistaken: $\zeta_{\alpha}=\Phi(\alpha)$, right? As I wrote: Coverage is indeed near nomial when using qnorm. $\endgroup$ Commented Mar 29, 2021 at 15:58
  • 1
    $\begingroup$ Yes, the paper does seem to say that (right after its formula $(2.5)$), but that's not believable. (Consider what the CI formula reduces to when no transformation is applied.) It might have intended to write $\xi_\alpha=\Phi^{-1}(\alpha).$ However, the change in notation from "$z_\alpha$" to "$\xi_\alpha$" does worry me. $\endgroup$
    – whuber
    Commented Mar 29, 2021 at 16:27
  • 2
    $\begingroup$ @whuber Thanks for the confirmation, that's what I suspected. I think this pretty much clears up my confusion. $\endgroup$ Commented Mar 29, 2021 at 16:30

0