I am having some trouble finishing the derivation of the $\hat{\beta}$ that minimizes $(Y-X\beta)^T(Y-X\beta) + \lambda \beta^T\beta$. After finding the partial derivative w.r.t $\hat{\beta}$, I get
\begin{align} \hat{\beta}(X^TX+\lambda I) = Y^TX \end{align}
With $X, Y, \hat{\beta}$ being matrices, I multiply both sides by the inverse of $(X^TX+\lambda I)$. However, on which side does this inverse go? I get $\hat{\beta} = Y^TX(X^TX+\lambda I)^{-1}$ but I see others with $\hat{\beta} = (X^TX+\lambda I)^{-1}X^TY$.
Two questions: where on earth does the $(X^TX+\lambda I)^{-1}X^TY$ go?
Also, if $X,Y$ are two column vectors, is $X^TY = Y^TX$?