For $\hat{\beta_{\mathrm{Ridge}}}$, where is the correct placement of $(X^TX+\lambda I)^{-1}$?

Question

I am having some trouble finishing the derivation of the $\hat{\beta}$ that minimizes $(Y-X\beta)^T(Y-X\beta) + \lambda \beta^T\beta$. After finding the partial derivative w.r.t $\hat{\beta}$, I get

\begin{align} \hat{\beta}(X^TX+\lambda I) = Y^TX \end{align}

With $X, Y, \hat{\beta}$ being matrices, I multiply both sides by the inverse of $(X^TX+\lambda I)$. However, on which side does this inverse go? I get $\hat{\beta} = Y^TX(X^TX+\lambda I)^{-1}$ but I see others with $\hat{\beta} = (X^TX+\lambda I)^{-1}X^TY$.

Two questions: where on earth does the $(X^TX+\lambda I)^{-1}X^TY$ go?

Also, if $X,Y$ are two column vectors, is $X^TY = Y^TX$?

angryavian · Accepted Answer · 2021-11-15 23:04:11Z

3

Typically one writes the gradient as a column vector. In that convention, the gradient is $$2(X^\top X + \lambda I) \hat{\beta} - 2X^\top Y.$$ If you instead write the gradient as a row vector, it would be $$2\hat{\beta}^\top(X^\top X + \lambda I) - 2Y^\top X.$$ (Note carefully the shapes of $X^\top Y$ and $Y^\top X$.) Manipulating the latter will give you an expression for $\hat{\beta}^\top$ so it is fine that the inverse matrix appears on the right. Taking a final transpose will bring the inverse matrix to the left.

answered Nov 15, 2021 at 23:04

angryavian

91k7 gold badges68 silver badges140 bronze badges

$\begingroup$ Ah okay, this makes sense. Can you re-explain why you switched the order of $X^T, Y$ in the second equation? $\endgroup$
– JerBear
Commented Nov 15, 2021 at 23:06
1

$\begingroup$ @JerBear As I mentioned, think about what shapes $X^\top Y$ and $Y^\top X$ have. $\endgroup$
– angryavian
Commented Nov 15, 2021 at 23:08
$\begingroup$ ahh I get it! It's to fit the dimension of the left-hand side. Thank you! $\endgroup$
– JerBear
Commented Nov 15, 2021 at 23:19

Add a comment |

Stack Exchange Network

For $\hat{\beta_{\mathrm{Ridge}}}$, where is the correct placement of $(X^TX+\lambda I)^{-1}$?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability
matrices
statistics
regression
.

Hot Network Questions

For $\hat{\beta_{\mathrm{Ridge}}}$, where is the correct placement of $(X^TX+\lambda I)^{-1}$?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probabilitymatricesstatisticsregression.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
matrices
statistics
regression
.