Consider the model
$$Y_i = 3 + 4x_i + e_i,$$
where $e_i \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma=1).$
A version of this is simulated in R as follows:
set.seed(625)
x = runif(20, 1, 23)
y = 3 + 4*x + rnorm(20, 0, 1)
Of course, one anticipates a linear association between $x_i$ and $Y_i,$
otherwise there is not much point trying to fit a regression line to the
data.
cor(x,y)
[1] 0.9991042
Let's do the regression procedure.
reg.out = lm(y ~ x)
reg.out
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
3.649 3.985
So the true intercept $\beta_0= 3$ from by simulation has been
estimated as $\hat \beta_0 = 3.649$ and the true slope
$\beta_1 =4$ has been estimated as $\hat \beta_1 = 3.985.$
A summary
of results shows rejection of null hypotheses
$\beta_0 = 0$ and $\beta_1 = 0.$
summary(reg.out)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-1.42617 -0.61995 -0.04733 0.41389 2.63963
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.64936 0.52268 6.982 1.61e-06 ***
x 3.98474 0.03978 100.167 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9747 on 18 degrees of freedom
Multiple R-squared: 0.9982, Adjusted R-squared: 0.9981
F-statistic: 1.003e+04 on 1 and 18 DF, p-value: < 2.2e-16
Here is a scatterplot of the data along with a plot of the regression
line through the data.
plot(x,y, pch=20)
abline(reg.out, col="blue")
![enter image description here](https://cdn.statically.io/img/i.sstatic.net/kQsi6.png)
With $\hat Y = \hat\beta_0 + \hat\beta_1,$
the residuals are $r_i = Y_i - \hat Y_i.$
They are vertical distances between the the $Y_i$ and
the regression line at each $x_i.$
We can retrieve their values as follows:
r = reg.out$resi
summary(r)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.42617 -0.61995 -0.04733 0.00000 0.41389 2.63963
The regression procedure ensures that $\bar r = 0,$ which
is why their Mean
was not shown in the previous summary.
Also, geneally speaking, one expects that the residuals will
not be correlated with either $x_i$ or $Y_i.$ If the linear model
is correct, then the regression
line expresses the linear trend, so the $r_i$ should not show
association with either $Y_i$ or $x_i$
cor(r,x); cor(r,y)
[1] -2.554525e-16
[1] 0.04231753
Because the errors are normally distributed, it is fair to
do a formal test to see if the null hypothesis $\rho_{rY} = 0$
is rejected. It is not.
cor.test(r,y)
Pearson's product-moment correlation
data: r and y
t = 0.1797, df = 18, p-value = 0.8594
alternative hypothesis:
true correlation is not equal to 0
95 percent confidence interval:
-0.4078406 0.4759259
sample estimates:
cor
0.04231753
Maybe this demonstration helps you to see why you should not
expect to see the correlations you mention in your question.
If you are still puzzled, maybe you can clarify your doubts
by making reference to the regression procedure above.