From a text:
For a real-valued differentiable function $f:\mathbb{R}^n\rightarrow\mathbb{R}$, the Hessian matrix $D^2f(x)$ is the derivative matrix of the vector-valued gradient function $\nabla f(x)$; i.e., $D^2f(x)=D[\nabla f(x)]$.
$\nabla f(x)$ is just an $n\times 1$ matrix consisting of $\partial f/\partial x_1,\partial f/\partial x_2,\ldots,\partial f/\partial x_n$.
Then $D[\nabla f(x)]$ must be a $1\times n$ matrix.
But I know that the Hessian matrix is an $n\times n$ matrix consisting of $\partial ^2f/\partial x_i\partial x_j$. How can the given definition be consistent with this?