1
$\begingroup$

I am having some trouble finishing the derivation of the $\hat{\beta}$ that minimizes $(Y-X\beta)^T(Y-X\beta) + \lambda \beta^T\beta$. After finding the partial derivative w.r.t $\hat{\beta}$, I get

\begin{align} \hat{\beta}(X^TX+\lambda I) = Y^TX \end{align}

With $X, Y, \hat{\beta}$ being matrices, I multiply both sides by the inverse of $(X^TX+\lambda I)$. However, on which side does this inverse go? I get $\hat{\beta} = Y^TX(X^TX+\lambda I)^{-1}$ but I see others with $\hat{\beta} = (X^TX+\lambda I)^{-1}X^TY$.

Two questions: where on earth does the $(X^TX+\lambda I)^{-1}X^TY$ go?

Also, if $X,Y$ are two column vectors, is $X^TY = Y^TX$?

$\endgroup$
0

1 Answer 1

3
$\begingroup$

Typically one writes the gradient as a column vector. In that convention, the gradient is $$2(X^\top X + \lambda I) \hat{\beta} - 2X^\top Y.$$ If you instead write the gradient as a row vector, it would be $$2\hat{\beta}^\top(X^\top X + \lambda I) - 2Y^\top X.$$ (Note carefully the shapes of $X^\top Y$ and $Y^\top X$.) Manipulating the latter will give you an expression for $\hat{\beta}^\top$ so it is fine that the inverse matrix appears on the right. Taking a final transpose will bring the inverse matrix to the left.

$\endgroup$
3
  • $\begingroup$ Ah okay, this makes sense. Can you re-explain why you switched the order of $X^T, Y$ in the second equation? $\endgroup$
    – JerBear
    Commented Nov 15, 2021 at 23:06
  • 1
    $\begingroup$ @JerBear As I mentioned, think about what shapes $X^\top Y$ and $Y^\top X$ have. $\endgroup$
    – angryavian
    Commented Nov 15, 2021 at 23:08
  • $\begingroup$ ahh I get it! It's to fit the dimension of the left-hand side. Thank you! $\endgroup$
    – JerBear
    Commented Nov 15, 2021 at 23:19

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .