1
$\begingroup$

Given n samples from a normally-distributed variable X, we estimate variance as $s^2=\frac{1}{n-1}\sum{(x_i - \bar{x})^2}$. We can also get a confidence interval for such a variance estimate as: $$CI_\alpha = [\frac{(n-1)s^2}{\chi^2_{\frac{\alpha}{2},n-1}},\frac{(n-1)s^2}{\chi^2_{1-\frac{\alpha}{2},n-1}}]$$

Now suppose we have two variables $X \sim N(0,\sigma_x), Y \sim N(0,\sigma_y)$, and we want to test whether $\sigma_x = \sigma_y$ for some level of significance $\alpha$. In this case it is common to use the F-distribution to perform an F-test on $\frac{s_x^2}{s_y^2}$.

However, we can also look at the confidence intervals for the two variance estimates.

If have found that it is often the case that if the confidence intervals overlap then the F-test (for the same $\alpha$) does not reject the null hypothesis (which is that $\sigma_x = \sigma_y$), and if the confidence intervals do not overlap then the F-test does reject the null hypothesis. But this is not always the case. Here is a sample set where, for $\alpha$=0.1, the confidence intervals overlap but the F-test rejects the null hypothesis:

 X   Y
 13 -24
-10 -24
-12   7
 10 -51
  2  28
-10  -4
 -6   2
  0   5
 -1 -32
-21 -10

Variance X = 108, 90% CI = [57, 292]
Variance Y = 522, 90% CI = [277, 1412]
F-test = .028

Squinting at the math, it looks like the F-test is considering a ratio where the confidence intervals are considering differences. It's unclear to me why one is better than the other for purposes of this test.

Question: For the purposes of testing the hypothesis $\sigma_x = \sigma_y$, is it better to use the F-test than to use confidence intervals as described above? Why?

$\endgroup$
6
  • $\begingroup$ What leads you to think that checking for "confidence interval overlap" is a reasonable way to proceed? $\endgroup$
    – Glen_b
    Commented Nov 16, 2023 at 7:20
  • $\begingroup$ @Glen_b If the confidence intervals overlap then we can't reject the hypothesis that the value of parameters is identical. $\endgroup$
    – feetwet
    Commented Nov 16, 2023 at 15:23
  • $\begingroup$ The idea of a "non-overlap of two intervals" rejection rule doesn't work for a test of means either; it won't replicate a t-test. Numerous questions on site address that. Indeed it doesn't even have the desired significance level. Clearly it's not a ratios vs differences issue (and there's more that could be done to show this, but I don't want to lose the main point). Given that it doesn't work for the more basic case of CIs for means, why would it work for this one? $\endgroup$
    – Glen_b
    Commented Nov 18, 2023 at 1:07
  • $\begingroup$ On that topic (overlap of two CIs for means compared to a test), there's a good discussion of some of the issues here in particular I recommend giving whuber's answer there some thought .... but briefly, there is a distinction between an interval for a difference in means vs overlap of individual intervals. This underlying issue applies to other situations as well. $\endgroup$
    – Glen_b
    Commented Nov 18, 2023 at 1:23
  • $\begingroup$ Once you move away from means, of course, things may change even further. However, on the ratio vs difference issue, consider than if you take logs of the endpoints of your intervals they will be valid intervals for $\log(\sigma_i^2)$ (in that the coverage statements will continue to hold, due to monotonicity of $\log$). Further, the overlap or non-overlap of the intervals will be unchanged by such transformation (again, it's strictly monotonic). Now you are looking at "difference" (log of a ratio is a difference of logs), but the overlap criterion is unchanged - it's not ratio vs difference. $\endgroup$
    – Glen_b
    Commented Nov 18, 2023 at 1:35

1 Answer 1

1
$\begingroup$

If the data from the first sample gives a very tight estimate for variance, $\sigma^2_x$, whilst data from the second sample gives a very wide estimate for second variance, $\sigma^2_y$, then even though confidence intervals may be overlapping now, there is a good chance that if you were to increase the number of samples in the second sample, your estimate for $\sigma_y^2$ would end up converging to a value outside the confidence interval for $\sigma_x^2$. So I would suggest that F-test, since it literally is about ratio of sample variances, is more appropriate.

Another way to approach this is with Bayesian methods. The appropriate distribution of variance given data is Scaled Inverse $\chi^2$. You can then have distributions for $\sigma^2_{x,y}$ and literally compute the probability $P\left[-\epsilon<\left(\sigma^2_x-\sigma^2_y\right)<\epsilon\right]$ for some epsilon which you deem to be sufficiently small to ignore the differences between variables. It might be a good idea to tie $\epsilon$ to uncertainty in the variances themselves.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.