Zero Covariance vs Independence of Slope and Intercept Estimators in Linear Models with Least Squares

Question

$\newcommand{\Cov}{\operatorname{Cov}}$Problem Statement: Under the assumptions of Exercise 11.16, find $\Cov\big(\hat\beta_0,\hat\beta_1\big).$ Use this answer to show that $\hat\beta_0$ and $\hat\beta_1$ are independent if $\displaystyle\sum_{i=1}^n x_i=0.$ [Hint: $\Cov\big(\hat\beta_0,\hat\beta_1\big)= \Cov\big(\overline{Y}-\hat\beta_1\overline{x},\hat\beta_1\big).$ Use Theorem 5.12 and the results of this section.]

Note: This is Problem 11.17 in Mathematical Statistics with Applications, 5th Ed., by Wackerly, Mendenhall, and Scheaffer.

My Work So Far: The assumptions of Exercise 11.16 are that $Y_1, Y_2,\dots,Y_n$ are independent normal random variables with $E(Y_i)=\beta_0+\beta_1 x_i$ and $V(Y_i)=\sigma^2.$ The first part of this question is largely done for us in the book. That is, it is derived that $$\Cov\big(\hat\beta_0,\hat\beta_1\big) =-\frac{\overline{x}\,\sigma^2}{\displaystyle\sum_{i=1}^n(x_i-\overline{x})^2},$$ where $\operatorname{Var}(Y_i)=\sigma^2.$ Now $\overline{x}=0$ if and only if $\sum_{i=1}^n x_i=0.$ So if the sum is zero, the covariance is zero. However, just because $\hat\beta_0$ and $\hat\beta_1$ are normally distributed and their covariance is zero does not make them independent. That would only be true if they were bivariately normally distributed.

My Questions: Is what I'm being asked to show even true? That is, is there something about $\hat\beta_0$ and $\hat\beta_1$ being OLS estimators that makes this result true? Or can I show that they are bivariate normal distributed? Zero covariance does not imply independence in general; why should it be so in this situation?

Note 1: in silverfish's answer to this question, it is mentioned in the paragraph beginning with "These two uncertainties apply independently..." that these two uncertainties "...should be technically independent." But it is not proven there, though it is intuitively explained and I could believe it.

Note 2: In this thread, Alecos simply makes the argument that I think the book wants here, but doesn't say anything about why zero covariance implies independence.

Note 3: I have reviewed a few other threads related to this, but none of them answers the main question of why zero covariance should imply independence in this situation, when it doesn't in general.

It seems to me that you have all ingredients together to answer your question: i) if $X$ and $Y$ have a bivariate normal distribution, zero covariance means independence, ii) if we assume normal errors, the least square coefficient estimates are jointly normally distributed (@jld mentions this in their answer). So the book is correct if the errors are assumed to be normal. — COOLSerdash, Commented Sep 9, 2021 at 20:31
@COOLSerdash Yes, but I don't understand jld's post, I'm afraid. This result is, apparently, supposed to be able to be proven without the linear algebra approach at all, since that comes later in this book. Question: does "jointly normally distributed" mean the same thing as "bivariate normally distributed"? — Adrian Keister, Commented Sep 9, 2021 at 20:40
Ah! According to the solutions manual, the book just want's you to calculate the covariance (which you did) and show that it's zero under some circumstances. Yes, "jointly normal" extends to multiple coefficients not just two. It's probably better to say that they have a multivariate normal distribution. — COOLSerdash, Commented Sep 9, 2021 at 20:45
Yeah I agree: The question probably shouldn't mention independence without making some further comments about the assumptions of the errors. — COOLSerdash, Commented Sep 9, 2021 at 20:52
Joint distribution of OLS estimators is discussed at several posts, including stats.stackexchange.com/q/347628/119261. — StubbornAtom, Commented Sep 9, 2021 at 21:04

jld · Accepted Answer · 2021-09-10 18:43:47Z

$\newcommand{\one}{\mathbf 1}\newcommand{\e}{\varepsilon}$I would just go for a linear algebra approach since then we get joint normality easily. You have $y = X\beta + \e$ with $X = (\one \mid x)$ and $\e\sim\mathcal N(\mathbf 0, \sigma^2 I)$.

We know $$ \hat\beta = (X^TX)^{-1}X^Ty \sim \mathcal N(\beta, \sigma^2 (X^TX)^{-1}) $$ where $$ (X^TX)^{-1} = \begin{bmatrix} n & n \bar x \\ n \bar x & x^Tx\end{bmatrix}^{-1} = \frac{1}{x^Tx - n\bar x^2}\begin{bmatrix} x^Tx/n & - \bar x \\ - \bar x & 1\end{bmatrix}. $$ By assumption $X$ is full rank, which in this case means $x$ is not constant (since the only way to be low rank is for $x$ to be in the span of $\one$). This means $\det X^TX \neq 0$, so $\text{Cov}(\hat\beta_0, \hat\beta_1) = 0$ if and only if $\bar x = 0$ and we do indeed have bivariate normality so this is equivalent to independence.

Here's a different approach that avoids using the normal equations. We know $$ \hat\beta_0 = \bar y - \hat\beta_1 \bar x \\ \hat\beta_1 = \frac{\text{Cov}(x,y)}{\text{Var}(x)} $$ and we want to show $\bar x = 0 \implies \hat\beta_0 \perp \hat\beta_1$, where I'm using "$\perp$" to denote independence.

Without losing any generality I'll assume $x^Tx = 1$ (this preserves $\bar x = 0$). Then under the assumption of $\bar x = 0$ we have $$ \hat\beta_0 = \bar y = n^{-1}\one^Ty \\ \hat\beta_1 = x^Ty - \bar y x^T\one = x^Ty. $$

This means $$ {\hat\beta_0 \choose \hat\beta_1} = (n^{-1}\one \mid x)^Ty $$ so this is a linear transformation of a Gaussian and is in turn Gaussian, and the covariance matrix is proportional to $$ (n^{-1}\one \mid x)^T(n^{-1}\one \mid x) = \begin{bmatrix} n^{-1} & 0 \\ 0 & 1\end{bmatrix} $$ which gives us independence.

This result can be generalized by noting that $\bar x = 0$ is equivalent to having an orthogonal design matrix in this case.

Suppose now we have an $n\times p$ full column rank covariate matrix $X$ which is partitioned as $X = (Z\mid W)$ where $Z$ has orthonormal columns and $W$ is unconstrained.

If every column is orthogonal, i.e. $X=Z$, the result is easy as $X^TX = I$ so $$ \hat\beta \sim \mathcal N(\beta, \sigma^2I). $$

I'll prove the following more interesting result: letting $\hat\beta_A$ denote the vector of coefficients for block $A$ of $X$, the elements of $\hat\beta_Z$ are conditionally independent given $\hat\beta_W$.

This can be shown by directly computing the covariance matrix of $\hat\beta_Z \mid \hat\beta_W$ and since $\hat\beta_Z\mid\hat\beta_W$ is still multivariate Gaussian, this gives us independence. I'll take $\sigma^2 = 1$ without losing any generality.

I'll start with the full covariance matrix of $\hat\beta$, which is proportional to $(X^TX)^{-1}$. $X^TX$ is a $2\times 2$ block matrix so we can invert it as $$ (X^TX)^{-1} = \begin{bmatrix}I & Z^TW \\ W^TZ & W^TW\end{bmatrix}^{-1} = \begin{bmatrix} I + Z^TWA^{-1}W^TZ & -Z^TWA^{-1} \\ -A^{-1}W^TZ & A^{-1} \end{bmatrix} $$ where $A = W^TW - W^TZZ^TW = W^T(I-ZZ^T)W$ gives the covariance matrix of $W$ after projecting all columns into the space orthogonal to the column space of $Z$.

It is not true in general that $I + Z^TWA^{-1}W^TZ = I$, so marginally we are not guaranteed independence in the $\hat\beta_Z$. But now if we condition $\hat\beta_Z$ on $\hat\beta_W$ we obtain $$ \text{Var}(\hat\beta_Z \mid \hat\beta_W) = I + Z^TWA^{-1}W^TZ - Z^TWA^{-1} \cdot A \cdot A^{-1}W^TZ = I $$ so we do indeed have conditional independence.

$\square$

Thank you for your answer, jld. That is helpful. However, I have not yet studied the linear algebra approach, so I cannot follow your line of reasoning at all, I'm afraid. — Adrian Keister, Commented Sep 9, 2021 at 19:38
@AdrianKeister ok, I just updated with a slightly different approach. I did still use a tiny bit of linear algebra though. Does this help at all? — jld, Commented Sep 9, 2021 at 19:59
jld: That's only slightly better, in that I can follow a few more steps before I get lost. What I'm really looking for is an approach using the notation I've introduced (and not a whole lot of new notation), and concepts that I'm already familiar with. I have updated my question to include the book from which I'm learning - my apologies for not doing that earlier. While I am exceedingly familiar with linear algebra in general, its application to statistics is pretty much unknown to me altogether at this point (I hope to rectify that in the future.) — Adrian Keister, Commented Sep 9, 2021 at 20:02
@AdrianKeister I just caught up on all of the comments. Do you feel like your question has been resolved at all or not really? If you really don't want to use any properties of multivariate Gaussians I think this might be an uphill battle... — jld, Commented Sep 10, 2021 at 15:32
Yeah, I think so. You're absolutely right about the multivariate Gaussians: that's the crux of the matter. As it is, I think the book's problem statement was unwise for its position in the book. It should have simply said, "Show that the correlation is zero." Happy to mark your answer as the solution. — Adrian Keister, Commented Sep 10, 2021 at 15:44

Stack Exchange Network

Zero Covariance vs Independence of Slope and Intercept Estimators in Linear Models with Least Squares

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
self-study
mathematical-statistics
least-squares
covariance
independence
or ask your own question.

Linked

Hot Network Questions

Zero Covariance vs Independence of Slope and Intercept Estimators in Linear Models with Least Squares

1 Answer 1

Not the answer you're looking for? Browse other questions tagged self-studymathematical-statisticsleast-squarescovarianceindependence or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
self-study
mathematical-statistics
least-squares
covariance
independence
or ask your own question.