For non-normally distributed random variables, why does $[\mu - 1.96 \sigma, \mu + 1.96 \sigma]$ contain $\approx$ 95% of the distribution?

Question

My question is, if you take a random variable $X$ with an arbitrary distribution, which has a known and well defined mean & variance, then why does the following interval contain $\approx 95\% $ of the distribution?

$$ \left[\mu - 1.959 \times \sqrt{\sigma^2}, \quad \mu + 1.959 \times \sqrt{\sigma^2}\right] $$

Like I understand why this is true for normally distributed variables as $\pm 1.959$ is just the magic number that maps to the 2.5th and 97.5th percentile of the standard normal distribution, but why does this approximately still hold for non-normally distributed variables?

Some R-code examples:

##### Beta Distribution
a <- 3
b <- 40

# Formulas from wiki https://en.wikipedia.org/wiki/Beta_distribution
mu <- a / (a+b)
sigma2 <-  a*b / (((a+b)^2) * ( a + b + 1))

bounds <- c(
    mu - 1.959 * sqrt(sigma2),
    mu + 1.959 * sqrt(sigma2)
)


pbeta(bounds[[2]], a , b) - pbeta(bounds[[1]], a , b) 
#> 0.9543413


##### Weibull Distribution
k <- 0.9
lambda <- 10

# Formulas from wiki https://en.wikipedia.org/wiki/Weibull_distribution
mu <- lambda * gamma(1 + 1/k)
sigma2 <- lambda^2 * (gamma(1 + 2/k) - gamma(1 + 1/k)^2) 

bounds <- c(
    mu - 1.959 * sqrt(sigma2),
    mu + 1.959 * sqrt(sigma2)
)

pweibull(bounds[[2]], k , lambda) - pweibull(bounds[[1]], k , lambda) 
#> 0.9484728

My guess is that this is just a property of the definition of the variance function itself however nothing obvious standards out to me as to why this holds, particularly on heavily skewed distributions.

Let's not confuse the issue of finding probability bounds for a distribution with the result from the central limit theorem, for which we can say that for most behaved distributions, the sample mean has an approximately normal distribution so that we can present $[\bar{X} - 1.96 \hat{S} / \sqrt{n}, \bar{X} + 1.96 \hat{S} / \sqrt{n}]$ as an interval estimate which contains 95% of repeated samples of the mean under similar design. Here I use — AdamO, Commented Aug 14, 2023 at 16:16
I’m voting to close this question because the question is based on a false premise, and too many replies have been necessary to inform the OP of this. — Frank Harrell, Commented Aug 15, 2023 at 11:35
@Sextus It holds for many distributions -- you can construct them -- and it holds approximately for a huge number of distributions in practical applications. Some people are surprised by this: the explanation is that there can be huge discrepancies in one tail that are compensated in the other tail. That generality is the true strength of the 68-95-99.7 rule. — whuber, Commented Aug 15, 2023 at 15:19
@Sextus You could start with, say, Gamma distributions. Provided the shape parameter is $1$ or greater, the error does not exceed 0.2%. Or Beta distributions: provided both shape parameters are $2$ or greater, the error is usually less than 1% and never more than 2.81%. I think this demonstrates the premise of the question is not false, as claimed elsewhere in these comments. — whuber, Commented Aug 15, 2023 at 16:12
I appreciate the spirit in which the section labeled "EDIT" is offered, but this really changes the meaning of the question. I think the better alternative is to remove the EDIT and ask a new question that focuses on the more specific meaning that you have in mind. You can link to that question if you like. — Sycorax, Commented Aug 15, 2023 at 17:40

Nick Cox · Accepted Answer · 2023-08-15 18:12:50Z

14

Unless the $\approx$ symbol is taken so broadly as to be almost meaningless, the premise of your question is false.

A few examples do not establish a general rule; they only establish that you didn't try cases that would get you far from what you found.

Clearly the proportion within 1.96 standard deviations cannot exceed 1 (since that's all the area there is), so presumably you don't intend it to extend as far from 0.95 as that.

But by Chebyshev's inequality, it can be as low as $1-\frac{1}{k}^2$ where $k=1.95996...$ ($k=\Phi^{-1}(0.975)$), which is less than 75% (73.968%).

The bound is tight - the limiting case is a symmetric three-point distribution, with spikes at $\mu$ and $\mu\pm k\sigma$. We can construct a sequence of distributions that approach the limiting distribution (and the Chebyshev bound for the proportion) as closely as desired.

However, if you restrict consideration to continuous unimodal distributions, then the lower bound is much closer. The proportion of the distribution within $\mu\pm 1.96\sigma$ would then be limited to be between about 88.4% and 100% and in that case there's considerably more scope for saying it's "about" 95%, but that's still fairly broad. If you want to bound it more closely still, so that $\approx$ in your title isn't doing quite such heavy lifting, you'd need further conditions.

edited Aug 15, 2023 at 18:12

Nick Cox

58.6k8 gold badges133 silver badges199 bronze badges

answered Aug 14, 2023 at 16:17

Glen_b

286k37 gold badges637 silver badges1.1k bronze badges

1

$\begingroup$ Thanks for your response this broadly makes sense. Though sorry with regards to "would then be limited to be between about 88.4% and 100%" where does that lower limit of 88.4%" come from ? $\endgroup$
– gowerc
Commented Aug 14, 2023 at 16:56
3

$\begingroup$ I believe it's meant to be $1-1/9\approx 88.9\%$ as given by Gauss's Inequality. $\endgroup$
– whuber ♦
Commented Aug 14, 2023 at 17:36
$\begingroup$ Must admit, whilst I understand the argument for why the identity doesn't hold true generally, I still find it curious that for many continuous unimodal distributions it does. Testing many distributions and simple transformations it is remarkable to me how well it holds up in practice (again yes it is easy to construct examples where it doesn't). My stab in the dark guess would be that many distributions can be reasonably approximated by a normal distribution which is why the identity then holds. $\endgroup$
– gowerc
Commented Aug 15, 2023 at 6:39
3

$\begingroup$ It approximately holds for a number of fairly non-normal distributions, so I don't think "approximate normality" is necessarily a full explanation of what we see here. A full accounting would, however, require making some of the notions precise enough to assess the claim (like just how we're defining "approximate" each time the word is used) $\endgroup$
– Glen_b
Commented Aug 15, 2023 at 9:58
1

$\begingroup$ This is it: en.wikipedia.org/wiki/Vysochanskij%E2%80%93Petunin_inequality There's also DasGupta's (2000) inequality for normal mixtures (which is less general but still fairly broad), which I think would give a lower bound of 91.3% in this case $\endgroup$
– Glen_b
Commented Aug 15, 2023 at 10:24

| Show 3 more comments

Sycorax · Accepted Answer · 2023-08-25 20:03:27Z

8

The claim is false for arbitrary distributions (unless $\approx$ is vacuously defined). We can show this using Chebyshev's inequality.

By Chebyshev's inequality, we know that given $0 < \sigma < \infty$, then $ \Pr(|X - \mu|\ge k\sigma)\le\frac{1}{k^2}$ for any $k>0.$

For the case of $k=1.96$, we have $ \Pr(|X - \mu|\ge 1.96\sigma)\le\frac{1}{1.96^2} = 0.260308 \dots$

So we know that there exists some distribution that has the property $$ \begin{align} \Pr(\mu - 1.96\sigma < X < \mu + 1.96\sigma) &\ge1 - 0.260308\dots \\ &\ge0.739692 \dots \\ &\not\approx 0.95 \end{align}$$

We can find an example where the equality is obtained. $$ \begin{array}{r|c} \text{event} & \text{probability} \\ \hline 1.96 & \frac{1}{2 \cdot 1.96^2} \\ 0 & 1 - \frac{1}{1.96^2} \\ -1.96 & \frac{1}{2 \cdot 1.96^2} \end{array} $$ By inspection, all of the probabilities are non-negative and sum to 1, so we know that this is a probability distribution. Likewise, we can read off that the mean must be 0 (if you don't believe me, apply the definition of expectation).

Applying the definition of variance, we have $$ \begin{align} \sigma^2 &= \frac{1}{2 \cdot 1.96^2}(0 - 1.96)^2 + \frac{1}{2 \cdot 1.96^2}(0 + 1.96)^2 + 0\\ &= 1 \end{align}$$

Plugging in to Chebyshev's inequality: $$ \begin{align} \Pr( - 1.96 < X < 1.96) &= 1 - \frac{1}{1.96^2} \\ &= 0.739692\dots \\ &\not\approx 0.95 \end{align}$$

We can generalize to an absolutely continuous distribution which comes arbitrarily close to the lower bound. One example is a mixture of three normal distributions with means at the zero and $\pm 1.96$, very small variances, and mixture coefficients given in the table.

edited Aug 25, 2023 at 20:03

answered Aug 14, 2023 at 16:11

Sycorax♦

92.6k23 gold badges231 silver badges382 bronze badges

$\begingroup$ For some counterpoint, and an explanation concerning why I find this analysis less than useful, note that the constraint on a distribution $F$ of the form $F[\mu+1.96\sigma]-F[\mu-1.96\sigma]=95/100$ defines a codimension-1 set in the space of all continuous finite-variance distributions, revealing such distributions as extremely numerous. Moreover, in the natural metric on that space, many commonly used distributions are close to one of these. The constructive approach to addressing the question then (IMHO) is to provide practical characterizations of the neighborhood of that subspace. $\endgroup$
– whuber ♦
Commented Aug 25, 2023 at 19:19
$\begingroup$ @whuber These are interesting observations, but they address a question that wasn't asked. I had hoped that OP would ask a new question in this vein and that reflected your previous comments on their question and was a little more careful in ints phrasing, because that would indeed be more interesting. $\endgroup$
– Sycorax ♦
Commented Aug 25, 2023 at 19:28
$\begingroup$ Although the question wasn't literally asked, it was, in my reading, specifically what was intended by the use of "approximately equal" ("$\approx$") and "approximately still hold for non-normally distributed variables" in the statement. $\endgroup$
– whuber ♦
Commented Aug 25, 2023 at 19:32
$\begingroup$ What is the intended meaning of “arbitrary distribution”? $\endgroup$
– Sycorax ♦
Commented Aug 25, 2023 at 20:41
$\begingroup$ I stated my interpretation of "arbitrary distribution" in the first comment. It can be enlarged to all distributions with finite variance provided you are a little careful in interpreting "95% of the" to account for the possibility of finite probabilities at the endpoints of the interval. $\endgroup$
– whuber ♦
Commented Aug 25, 2023 at 21:01

| Show 2 more comments

Stack Exchange Network

For non-normally distributed random variables, why does $[\mu - 1.96 \sigma, \mu + 1.96 \sigma]$ contain $\approx$ 95% of the distribution?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
confidence-interval
variance
random-variable
or ask your own question.

Hot Network Questions

For non-normally distributed random variables, why does $[\mu - 1.96 \sigma, \mu + 1.96 \sigma]$ contain $\approx$ 95% of the distribution?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged confidence-intervalvariancerandom-variable or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
confidence-interval
variance
random-variable
or ask your own question.