In finding the Residual Sum of Squares (RSS) We have:
\begin{equation} \hat{Y} = X^T\hat{\beta} \end{equation}
where the parameter $\hat{\beta}$ will be used in estimating the output value of input vector $X^T$ as $\hat{Y}$
\begin{equation} RSS(\beta) = \sum_{i=1}^n (y_i - x_i^T\beta)^2 \end{equation}
which in matrix form would be
\begin{equation} RSS(\beta) = (y - X \beta)^T (y - X \beta) \end{equation}
differentiating w.r.t $\beta$ we get
\begin{equation} X^T(y - X\beta) = 0 \end{equation}
My question is how is the last step done? How did the derivative get the last equation?