Properties of ridge regression hat matrix and ridge residuals

Question

I'm referencing https://arxiv.org/pdf/1509.09169.pdf on ridge regression. On page 34 question 1.5 we need to prove :
Ridge fit $\widehat{Y}(\lambda)=X(X^{\top}X+\lambda I_p)^{-1}X^{\top}Y$ is not orthogonal to ridge residual $Y − \widehat{Y}(\lambda)$.

To how this since I think we can use that the hat matrix for ridge regression is not a projection matrix but that does not give me anything useful. In the OLS case we show that the residual is not orthogonal to $X$ since $\widehat{Y}(\lambda)$ is linear combination of $X$, but I do not think we can use this here as the linear combination property might not hold here due to the term $\lambda I_p$. Please tell how to show this.

For some special values of $\lambda$ orthogonality may hold. A useful intuition is the characterization of Ridge Regression in terms of augmenting the data: see stats.stackexchange.com/a/164546/919. If the original model matrix had a column of constants (plus at least one other non-constant column), obviously the augmented matrix cannot have a column of constants. Most of the time, the column space will not include any nonzero constant vectors (but, for a finite set of $\lambda,$ it could). — whuber, Commented Dec 7, 2020 at 17:40

Alecos Papadopoulos · Accepted Answer · 2020-12-07 12:44:39Z

1

Set for clarity

$$B \equiv (X^{\top}X+\lambda I_p)^{-1}$$

and you are asked to examine

$$(Y - \widehat{Y})^{\top}XBX^{\top}Y = (Y - XBX^{\top}Y)^{\top}XBX^{\top}Y.$$

Doing the algerba

$$...=Y^{\top}XBX^{\top}Y - Y^{\top}XBX^{\top}XBX^{\top}Y.$$

If $B$ was equal to $X^{\top}X$ the second component would simplify and become equal to the first, hence the zero result. But in Ridge regession this is not the case, so the expression does not equal zero.

Continuing with the manipulations,

$$...=Y^{\top}XBX^{\top}\big[I-XBX^{\top}]Y$$.

If $B$ was equal to $X^{\top}X$, then the term in brackets would become the ("complementary") projection matrix of $X$, and would make the expression zero.It is not, so no zero result.

edited Dec 7, 2020 at 12:44

answered Dec 6, 2020 at 20:09

Alecos Papadopoulos

59.8k8 gold badges151 silver badges276 bronze badges

$\begingroup$ I get from here $ Y^{\top}(XBX^{\top} - (XBX^{\top})^{\top}(XBX^{\top}))Y $. I have shown that $ XBX^{\top} $ is not a projection matrix. Thus if I equate this to zero and then removing $Y$ and $Y^{\top}$ by pre and post multiplying by their transpose, I can say that since the bracket term is not zero, these can not be orthogonal. Is my reasoning correct? $\endgroup$
– Vks
Commented Dec 7, 2020 at 6:19
$\begingroup$ Thank you for your answer. It is clear to me. Though I wanted to ask if how I did (in the comment) is also right? $\endgroup$
– Vks
Commented Dec 10, 2020 at 7:56

Add a comment |

Stack Exchange Network

Properties of ridge regression hat matrix and ridge residuals

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
regression
ridge-regression
or ask your own question.

Linked

Hot Network Questions

Properties of ridge regression hat matrix and ridge residuals

1 Answer 1

Not the answer you're looking for? Browse other questions tagged regressionridge-regression or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regression
ridge-regression
or ask your own question.