Why is the residual variance / pooled sample variance divided by n-k in ANOVA?

Question

I was looking for a proof such as for sample variance where it's shown that expected value of sample variance with n-1 in the denominator yields the parameter. I'm not even sure what pooled sample variance / residual variance tries to estimate $$ E[\frac{1}{n-k}\sum_{i=1}^{n}(y_i-\bar{y}_{g(i)})^2] = E[\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)s_j^2] =?$$ n - # observations, k - # groups, $n_j$ # observations in $j$ groups, $s_j^2$ group variance. $g(i)$ assign

Is it population variance? I think no, because it's about groups.

My attempt was: $$ E[\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)s_j^2] = \frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)E[s_j^2] $$ Yet I'm not sure what is the expectation of the group. Thanks

In order to make it an unbiased estimator for the variance of the linear model. — Amir, Commented Feb 29 at 22:33

Amir · Accepted Answer · 2024-03-03 10:50:39Z

1

First note that $n=\sum_{i=1}^{k}n_j$. Secondly, for each group $j$ we consider the following linear model:

$$X_{ji}=\mu_j+\epsilon_{ji}, i=1,\dots,n_j$$

where $\epsilon_{ij}$ are independent and follow $\mathcal N (0,\sigma^2)$.

Hence, the sample variance $S^2_j$ of the observations $X_{ji}, i=1,\dots,n_j$ from group $j$ is an unbiased estimator of $\sigma^2$, i.e., $\mathbb E[S_j^2]$. Finally, we have

$$\text{MSE}=\frac{SSE}{n-k}= \mathbb E [\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)S_j^2] = \frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1) \mathbb E[S_j^2]\\=\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)\sigma^2 =\sigma^2 \frac{1}{n-k} \left ( \sum_{i=1}^{k}n_j -k \right )=\sigma^2,$$

which means that $\text{MSE}$ is also an unbiased estimator $\sigma^2$ (it is better as it has a less variation compared to each $S^2_j$). Now you can see why $n-k$ is used here.

answered Mar 3 at 10:50

Amir

8,4241 gold badge5 silver badges29 bronze badges

1

$\begingroup$ Thanks. I'm seeing now - MSE is an estimator of the variance of the underlying normal distributions which by ANOVA assumptions is shared for all groups populations. $\endgroup$
– Maciej Jałocha
Commented Mar 3 at 18:52
$\begingroup$ My further question is: You write: "which means that MSE is also an unbiased estimator $σ^2$ (it is better as it has a less variation compared to each $S^2_j$)." Is the reason for why "simple" weighted mean of group variances ($\frac{1}{n}\sum_{j=1}^k n_j S_j^2$) is not used , the fact that in turn it wouldn't equal residual variance? I guess such "simple" weighted mean as an estimator of $σ^2$ yields also smaller variance. $\endgroup$
– Maciej Jałocha
Commented Mar 3 at 18:58
1

$\begingroup$ @MaciejJałocha I like that you tried to extend the result. Yes , the new one is also an unbiased estimator with less variance compared to each sample variance. However, I am not sure which one of the two combined statistics is better. You may compare them for $k=2$ and tell me the result. Hint: $$\text{var}(S_i^2)=\frac{2\sigma^4}{n_i-1}.$$ $\endgroup$
– Amir
Commented Mar 3 at 19:18

Add a comment |

Stack Exchange Network

Why is the residual variance / pooled sample variance divided by n-k in ANOVA?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
statistics
random-variables
variance
sampling
anova
.

Hot Network Questions

Why is the residual variance / pooled sample variance divided by n-k in ANOVA?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged statisticsrandom-variablesvariancesamplinganova.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
statistics
random-variables
variance
sampling
anova
.