7
$\begingroup$

I am looking for a way to calculate confidence intervals for the interquartile range IQR of a numerical variable. Of course, they can be found by the bootstrap, but I am explicitly looking for a different, still distribution-free way. Since the IQR is a quantile difference, this reference could head into the right direction.

How would a pseudo-algorithm (or R/Python code) look like?

Here the "cheap" bootstrap way in R:

library(boot)
set.seed(1)
x <- rnorm(100)
S <- boot(x, function(x, ix) IQR(x[ix]), R = 10000)
boot.ci(S, type = "bca")

# BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
# Based on 10000 bootstrap replicates
# 
# CALL : 
#   boot.ci(boot.out = S, type = "bca")
# 
# Intervals : 
#   Level       BCa          
# 95%   ( 0.958,  1.448 )  
# Calculations and Intervals on Original Scale

Is there something similar like the binomial approach used by @whuber in his answer here? How to obtain a confidence interval for a percentile?

$\endgroup$
2
  • 1
    $\begingroup$ I think the reference you gave pretty much contains the answer. I tried to implement their formula $\alpha_{7}$ in R but it somehow gives nonsensical outputs (I might not understand it fully). $\endgroup$ Commented May 22, 2020 at 7:04
  • 1
    $\begingroup$ Thx for trying @COOLSerdash - record statistics are a new field to me! $\endgroup$
    – Michael M
    Commented May 22, 2020 at 17:00

1 Answer 1

1
$\begingroup$

Two observations, which may produce an acceptable result.

[EDIT] To answer the question on a theoretical formula, I start with individual sample quantiles, see presentation here, which assumes knowledge of the probability density function (pdf). Also, also this work which gives precise theoretical results for several distributions on the expected value and variance of the interquartile range.

Next, as the pdf is generally not known, there are several possible paths to estimate the variance of a sample quantile in practice (see, for example, discussion here per this 2005 work: ' VARIANCE ESTIMATION FOR SAMPLE QUANTILES USING THE m OUT OF n BOOTSTRAP'). Choose one.

Second, as the interquartile range is computed from a simultaneously drawn sample from two sides of an observed empirical distribution, I would argue that randomly having higher (or lower) observations on one side of the distribution produce correspondingly lower (or higher) counts on the opposite side. In other words, the sampling error between the individual quantiles in the IQR are likely negatively correlated.

So being conservative, the variance of the difference between the respective quantiles constituting the IQR is at most the sum of their individual variances (as the covariance term is expected to be negative).

Now, proceed to construct (and test) an interval for the IQR based on the square-root of the composite variance.

$\endgroup$
2
  • 1
    $\begingroup$ How does one accomplish this in a distribution-free way, as requested by the OP? $\endgroup$
    – whuber
    Commented May 21, 2020 at 14:54
  • $\begingroup$ Added the theoretical reference part, which leads to the current paths for estimation, as estimating the probability density function is not simple (or particularly accurate) in practice. $\endgroup$
    – AJKOER
    Commented May 21, 2020 at 15:12

Not the answer you're looking for? Browse other questions tagged or ask your own question.