Why is the correlation between independent variables/regressor and residuals zero for OLS?

Question

In page 4 of https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf, it states that the regressors have zero correlation with the residuals for OLS, but I don't think this is true.

The assertion is based on the fact that $$ X^Te = 0 $$ where $e$ are the residuals $y - \hat{y}$.

But why does this mean the regressor is uncorrelated with the residual?

I tried to derive this using the definition of covariance for 2 random variables. $X_p$ is the random variable corresponding to the p-th regressor. \begin{align} cov(X_p, e) = E[(X_p - \mu_{X_p})(e - \mu_e)] \\ cov(X_p, e) = E[(X_p - \mu_{X_p})(e - \mu_e)] \\ = E[X_p e - \mu_{X_p} e - \mu_e X_p + \mu_{X_p} \mu_e] \\ = E[X_p e] - \mu_{X_p} \mu_e \end{align}

We know that $E[X_p e] = 0$, but $X_p$ is only uncorrelated with $e$ if one of their means are zero.

Edit. I think there may be a mistake in my derivation. I do not believe $E[X_p e] = 0$.

Since you don't think this is true, what counterexample have you come up with? This will help us understand how you interpret the meaning of "correlation" in this context. The ambiguity of meaning lies in the fact that $X$ is explicitly not a random variable, but $e$ is. — whuber, Commented Jun 25, 2020 at 22:19
@whuber I just edited the OP with my derivation, which I think is a counterexample? I interpret correlation as the definition of correlation (covariance divided by the product of standard deviations of 2 random variables). Also, I believe $X$ is a random variable, or I should say the matrix $X$ consists of $M$ random variables where $M$ is the number of regressors. — user5965026, Commented Jun 25, 2020 at 22:27

Thomas Lumley · Accepted Answer · 2020-06-26 01:26:38Z

11

In any model with an intercept, the residuals are uncorrelated with the predictors $X$ by construction; this is true whether or not the linear model is a good fit and it has nothing to do with assumptions.

It's important here to distinguish between the residuals and the unobserved things often called the errors.

The covariance between residuals $R$ and $X$ is $$\frac{1}{n}\sum RX-\frac{1}{n}(\sum R)\frac{1}{n}(\sum X)$$ If the model includes an intercept $\sum R=0$, so the covariance is just $\frac{1}{n}\sum RX$. But the Normal equations to estimate $\hat\beta$ are $X(Y-\hat Y)=0$, ie, $\frac{1}{n}\sum XR=0$.

So the residuals and $X$ are exactly uncorrelated.

When there is actually a model $$Y = X\beta+e$$ the assumption that the errors $e$ are uncorrelated with $X$ is necessary to make $\hat\beta$ unbiased for $\beta$ (and we assume the errors have mean zero to make the intercept identifiable). So $E[X^Te]=0$ is an assumption, not a theorem.

The residuals typically are not uncorrelated with $Y$. Neither are the errors.

edited Jun 26, 2020 at 1:26

answered Jun 26, 2020 at 1:13

Thomas Lumley

41.6k1 gold badge52 silver badges145 bronze badges

1

$\begingroup$ Couple of questions. (1) Regarding your last sentence. $\hat{Y}$ is uncorrelated with the residuals, but not $Y$, right? (2) In our $X$ matrix we have a column of 1s, so we end up with $\langle \boldsymbol{1}_n, r \rangle = 0$. For this to be true, $\sum_i r_i = 0$. But if our offset evaluates to zero, don't we still still have a column of 1s in the $X$ matrix? It's just that when you do $X\hat{beta}$, the first column of $X$ will end up being multiplied to $\hat{\beta}_0 = 0$. $\endgroup$
– user5965026
Commented Jun 26, 2020 at 2:46
$\begingroup$ That's right. $Y$ is correlated with the residuals, $\hat Y$ isn't. You would almost always have an intercept in the model, but you certainly can specify models without one, and they even have uses, such as for ratio estimation in surveys. The residuals in models like that don't add to zero. $\endgroup$
– Thomas Lumley
Commented Jun 26, 2020 at 10:03
$\begingroup$ +1: In ordinary least squares, the mean of the residuals $e_i=y_i-\hat{y}_i$ will be $0$. If it is not, and instead $\frac{1}{n}\sum (y_i-\hat{y}_i)= k \not = 0$, then $\hat{y}_i-k$ will be a better least squares estimate than $\hat{y}_i$ in the sense that $\sum(y_i-(\hat{y}_i-k))^2 = \sum(y_i-\hat{y}_i)^2 -k^2\lt \sum(y_i-\hat{y}_i)^2$. Indeed the constant term in the OLS result will deal with this automatically $\endgroup$
– Henry
Commented Jun 26, 2020 at 12:49
1

$\begingroup$ I think I'm still a little confused on the concept of "having an intercept." If you formulate the model so that it "has" an offset, but the $\hat{\beta}$ element corresponding to the offset, $\hat{\beta}_0$, happens to evaluate to zero, your residuals still sum to zero right? But if instead, you formulated the model without the $\hat{\beta}_0$ term, then your residuals are no longer guaranteed to zum to zero? Not sure if these questions make sense to others? $\endgroup$
– user5965026
Commented Jun 26, 2020 at 15:13
$\begingroup$ I see that in your proof, you assume that the model contains an intercept so $\sum R = 0$. Would the claim that the regressors are uncorrelated with the residuals still hold if the model did not have an intercept? $\endgroup$
– timeinbaku
Commented Mar 14 at 19:53

| Show 1 more comment

BruceET · Accepted Answer · 2020-06-25 23:07:29Z

Consider the model $$Y_i = 3 + 4x_i + e_i,$$ where $e_i \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma=1).$

A version of this is simulated in R as follows:

set.seed(625)
x = runif(20, 1, 23)
y = 3 + 4*x + rnorm(20, 0, 1)

Of course, one anticipates a linear association between $x_i$ and $Y_i,$ otherwise there is not much point trying to fit a regression line to the data.

cor(x,y)
[1] 0.9991042

Let's do the regression procedure.

reg.out = lm(y ~ x)
reg.out

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
      3.649        3.985

So the true intercept $\beta_0= 3$ from by simulation has been estimated as $\hat \beta_0 = 3.649$ and the true slope $\beta_1 =4$ has been estimated as $\hat \beta_1 = 3.985.$ A summary of results shows rejection of null hypotheses $\beta_0 = 0$ and $\beta_1 = 0.$

summary(reg.out)

Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.42617 -0.61995 -0.04733  0.41389  2.63963 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.64936    0.52268   6.982 1.61e-06 ***
x            3.98474    0.03978 100.167  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9747 on 18 degrees of freedom
Multiple R-squared:  0.9982,    Adjusted R-squared:  0.9981 
F-statistic: 1.003e+04 on 1 and 18 DF,  p-value: < 2.2e-16

Here is a scatterplot of the data along with a plot of the regression line through the data.

plot(x,y, pch=20)
abline(reg.out, col="blue")

With $\hat Y = \hat\beta_0 + \hat\beta_1,$ the residuals are $r_i = Y_i - \hat Y_i.$ They are vertical distances between the the $Y_i$ and the regression line at each $x_i.$

We can retrieve their values as follows:

r = reg.out$resi
summary(r)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-1.42617 -0.61995 -0.04733  0.00000  0.41389  2.63963

The regression procedure ensures that $\bar r = 0,$ which is why their Mean was not shown in the previous summary.

Also, geneally speaking, one expects that the residuals will not be correlated with either $x_i$ or $Y_i.$ If the linear model is correct, then the regression line expresses the linear trend, so the $r_i$ should not show association with either $Y_i$ or $x_i$

cor(r,x);  cor(r,y)
[1] -2.554525e-16
[1] 0.04231753

Because the errors are normally distributed, it is fair to do a formal test to see if the null hypothesis $\rho_{rY} = 0$ is rejected. It is not.

cor.test(r,y)

        Pearson's product-moment correlation

data:  r and y
t = 0.1797, df = 18, p-value = 0.8594
alternative hypothesis: 
  true correlation is not equal to 0
95 percent confidence interval:
 -0.4078406  0.4759259
sample estimates:
       cor 
0.04231753

Maybe this demonstration helps you to see why you should not expect to see the correlations you mention in your question. If you are still puzzled, maybe you can clarify your doubts by making reference to the regression procedure above.

Thanks for this visualization. I was mostly looking for a theoretical perspective on why this is the case. Specifically, in that link, they state that $X^T e = 0$ means the the residuals are not correlated with the regressors, but I don't understand how that implication came about. — user5965026, Commented Jun 25, 2020 at 23:22
Hi: I'm not sure what you're confused about but, Ii it's an issue with your formula, the mean of the residuals is assumed to be zero. If it was not assumed to be zero, then the model itself wouldn't really make sense because the residuals would then be "explaining" the dependent variable to some extent. The residuals are not supposed to explain anything. They are what's left over after the other variables are used explain the dependent variable $Y$. — mlofton, Commented Jun 26, 2020 at 0:21
The mean of the residuals is zero if you have an intercept and isn't otherwise. It's the mean of the errors that is assumed to be zero. — Thomas Lumley, Commented Jun 26, 2020 at 1:21
@ThomasLumley: (1) Mean of errors assumed to be 0: as in my "$e_i \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma=1).$" (2) I have an intercept. Mean of residuals is zero: as in my "The regression procedure ensures that $\bar r=0.$" Exact purpose of your comment is unclear. — BruceET, Commented Jun 26, 2020 at 2:18

Stack Exchange Network

Why is the correlation between independent variables/regressor and residuals zero for OLS?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
correlation
least-squares
covariance
or ask your own question.

Linked

Hot Network Questions

Why is the correlation between independent variables/regressor and residuals zero for OLS?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged correlationleast-squarescovariance or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
correlation
least-squares
covariance
or ask your own question.