1
$\begingroup$

At page 53 of the famous book The Elements of Statistical Learning by Hastie et al., given a univariate regression model of the type $$Y=X\beta + \epsilon$$ the estimator of the $\beta$ parameter is obtained as $$\hat{\beta}=\frac{\sum_1^N x_i y_i}{\sum_1^N x_i^2}=\frac{\left<\mathbb{x},\mathbb{y}\right>}{\left<\mathbb{x},\mathbb{x}\right>}$$ The authors then consider a model with intercept of the type $$Y=\beta_0 + X\beta_1 + \epsilon$$ and write the estimator for $\beta_1$, the parameter of $X$, as $$\hat{\beta_1}=\frac{\left<\mathbb{x}-\bar{x}\mathbb{1},\mathbb{y}\right>}{\left<\mathbb{x}-\bar{x}\mathbb{1},\mathbb{x}-\bar{x}\mathbb{1}\right>}$$ with $$\bar{x}=\frac{\sum_1^N x_i}{N}\qquad\mathbb{1}=\mathbb{x_0}\text{ the vector of }N\text{ ones}$$ While I separately understand (as in "I'm able to calculate") these two results, I was wondering if there's any elegant, straightforward way to see how $\hat{\beta_1}$ can be obtained from $\hat{\beta}$, that is why $$\mathbb{x}\longrightarrow\mathbb{x}-\bar{x}\mathbb{1}$$ when an intercept term is added.

Thanks in advance for any help!

$\endgroup$

1 Answer 1

1
$\begingroup$

To obtain $\hat\beta_1$ from $\hat\beta$, we need to develop some special framework for linear regression which supports adding a new predictor to existing ones.

Below is one of such framework that provides a relatively straightforward way to explain how adding an intercept term can affect the existing $\hat\beta$.

The goal of linear regression is to minimize $|y-\beta x|^2=\left<y-\beta x, y-\beta x\right >$. When we have only one predictor (and no intercept term), we get solution

$$ \hat{\beta}=\frac{\left<\mathbb{x},\mathbb{y}\right>}{\left<\mathbb{x},\mathbb{x}\right>} \tag{1}$$

Next we consider the special and simple linear regression for two predictors which are orthogonal to each other. For this case, what we minimize is

$$|y-\beta_1 x_1 - \beta_2 x_2|^2$$

Denoting $\beta_1 x_1$ as $A$ and $\beta_2 x_2$ as B, we have $\left<A, B\right>=0$, and the squared error as

$$ \begin{align*} |y-A-B|^2&=\left<y-A-B, y-A-B\right>\\ &=\left<y, y\right>+\left<A,A\right>+\left<B,B\right> -2\left<y,A\right>-2\left<y,B\right>+2\left<A,B\right>\\ &=\left<y, y\right> - 2\left<y, A\right> + \left<A, A\right> + \left<y, y\right> - 2\left<y, B\right> + \left<B, B\right> -\left<y, y\right>\\ &=\left<y-A, y-A\right> + \left<y-B, y-B\right> + c \end{align*} $$

where $c$ is a constant as we change $\beta_1$ and $\beta_2$. From the above eqution, we see that minimizing $|y-A-B|^2$ is equevalent to minimizing $\left<y, A\right>$ and $\left<y, B\right>$, independently. Thus the solution for $\beta_1$ and $\beta_2$ are given by the univariate linear regression for $x_1$ and $x_2$.

Now we consider the case we want to deal with: adding an intercept term to a univariate lineare regression. Adding the intercept can be viewed as as adding new predictor which is equal to '1', and the intercept is the coefficient. The problem is that the vecotr $1$ may not be orthognal to the existing predictor $x_1$, so to get a solution based on our framework, we need to convert the existing predictor $x_1$ and the new $1$ vecotor the two orthognal ones. To make such decomposition, we project $x_1$ to $1$ and substract this part from $x_1$, and we get the two vectors $A$ and $B$ as $$ A = x_1 - \frac{\left<x_1, 1\right>}{\left<1, 1\right>}1=x_1-\bar{x}_11 $$ and $$ B = 1 + \frac{\left<x_1, 1\right>}{\left<1, 1\right>}1=1 + \bar{x}_11 $$

$A$ and $B$ are orthognal and the coefficient for $A$ is given by univariate linear regression by replacing x with $A$, which is $$ \hat{\beta}_A=\frac{\left<x_1-\bar{x}_1\mathbb{1},\mathbb{y}\right>}{\left<x_1-\bar{x}_1\mathbb{1},x_1-\bar{x}_1\mathbb{1}\right>} $$

and this is also the coefficient for the original $x_1$ vector. The coefficient for B is irrelevent to $x_1$ as it will be multiplized to the vecter in the $1$ direction only.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .