1
$\begingroup$

I know that when a sample is non-normally distributed, I need to use non-parametric alternatives, as when the assumptions of parametric methods are violated, its coverage will decrease.

Let's say that I want to compute the confidence interval variance of a non-normal data. I chose variance because variance is sensitive to non-normality than mean.

To my knowledge, the equation for the CI of variance for normal distribution is:

$$\frac{(n-1)s^{2}}{\chi^{2}_{\frac{\alpha}{2}, df}} \leq \sigma^2 \leq \frac{(n-1)s^{2}}{\chi^{2}_{1-\frac{\alpha}{2}, df}}$$

However, if I generate a skewed & exponential distribution multiple times, I get 100% coverage of population variance within the analytical CI of variance. Here's the Python code snippet:

from scipy import stats
import matplotlib.pyplot as plt
import numpy as np

count_norm = []
count_expon = []
count_skew = []

np.random.seed(42)

iteration = 1000

for i in range(iteration):

    # generate samples
    norm = stats.norm.rvs(size=1000, scale=100)
    expon = stats.expon.rvs(size=1000, scale=100)
    skew = stats.skewnorm.rvs(a=10, size=1000, scale=100)

    # sample variance
    var_norm = np.var(norm, ddof=1)
    var_expon = np.var(expon, ddof=1)
    var_skew = np.var(skew, ddof=1)

    # analytical CI
    lo_norm, hi_norm = (
        (len(norm) - 1) * np.var(norm, ddof=1) / stats.chi2.ppf(1 - 0.05 / 2, len(norm) - 1),
        (len(norm) - 1) * np.var(norm, ddof=1) / stats.chi2.ppf(0.05 / 2, len(norm) - 1)
    )
    lo_expon, hi_expon = (
        (len(expon) - 1) * np.var(expon, ddof=1) / stats.chi2.ppf(1 - 0.05 / 2, len(expon) - 1),
        (len(expon) - 1) * np.var(expon, ddof=1) / stats.chi2.ppf(0.05 / 2, len(expon) - 1)
    )
    lo_skew, hi_skew = (
        (len(skew) - 1) * np.var(skew, ddof=1) / stats.chi2.ppf(1 - 0.05 / 2, len(skew) - 1),
        (len(skew) - 1) * np.var(skew, ddof=1) / stats.chi2.ppf(0.05 / 2, len(skew) - 1)
    )

    # coverage test
    if lo_norm <= var_norm <= hi_norm:
        count_norm.append(1)
    else:
        count_norm.append(0)

    if lo_expon <= var_expon <= hi_expon:
        count_expon.append(1)
    else:
        count_expon.append(0)

    if lo_skew <= var_skew <= hi_skew:
        count_skew.append(1)
    else:
        count_skew.append(0)

print('coverage of normal dist =', sum(count_norm) / iteration)
print('coverage of exponential dist =', sum(count_expon) / iteration)
print('coverage of skewed dist =', sum(count_skew) / iteration)

>>> coverage of normal dist = 1.0
>>> coverage of exponential dist = 1.0
>>> coverage of skewed dist = 1.0

I am not sure how many people can understand this code, because most of the question answerers in this community are familiar with R...

Can I get an explanation of why I'm getting 100% coverage of population variance from a non-normal distribution using analytical confidence interval of variance, even when the assumption of normality is violated?

Or it would be great if anyone can disapprove my simulation result with his custom R simulation result... because what I'm getting is contradicting what I know.

Thanks

$\endgroup$
2
  • 3
    $\begingroup$ You are testing whether the estimated (sample) variance (rather than the true variance) lies within the confidence interval around the estimated variance, which it trivially does. $\endgroup$
    – Björn
    Commented Oct 8, 2019 at 3:43
  • $\begingroup$ @Björn ah, I see! I wasn't simulating random draw from the population! $\endgroup$
    – Eric Kim
    Commented Oct 8, 2019 at 23:38

0