Hessian matrix as derivative of gradient

Question

From a text:

For a real-valued differentiable function $f:\mathbb{R}^n\rightarrow\mathbb{R}$, the Hessian matrix $D^2f(x)$ is the derivative matrix of the vector-valued gradient function $\nabla f(x)$; i.e., $D^2f(x)=D[\nabla f(x)]$.

$\nabla f(x)$ is just an $n\times 1$ matrix consisting of $\partial f/\partial x_1,\partial f/\partial x_2,\ldots,\partial f/\partial x_n$.

Then $D[\nabla f(x)]$ must be a $1\times n$ matrix.

But I know that the Hessian matrix is an $n\times n$ matrix consisting of $\partial ^2f/\partial x_i\partial x_j$. How can the given definition be consistent with this?

$\begingroup$ what text is this from ? $\endgroup$
– Samael Manasseh
Commented Oct 16, 2023 at 18:10 — Samael Manasseh, Commented Oct 16, 2023 at 18:10

Dan · Accepted Answer · 2014-05-25 20:43:26Z

14

The line "Then $D[\nabla f(x)]$ must be a $1\times n$ matrix" is where your confusion lies.

The derivative operator $D$ applied to a vector gives us how each component changes with each direction. Being more explicit with the notation we have

$$\begin{align}\nabla f(\mathbf x) &= D[f (\mathbf x)]\\ &= \left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right)\end{align}$$

Now think of applying $D$ to each element of this vector individually;

$$\begin{align}D[\nabla f(\mathbf x)] &= D[D[f(\mathbf x)]]\\ &=\left(D\left[\frac{\partial f}{\partial x_1}\right]^T, \ldots, D\left[\frac{\partial f}{\partial x_n}\right]^T\right)\end{align}$$ Which expands to give us the Hessian matrix $$D^2[f(\mathbf x)]=\left(\begin{matrix}\frac{\partial^2 f}{\partial x_1^2} & \ldots & \frac{\partial^2 f}{\partial x_1\partial x_n}\\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1}& \ldots & \frac{\partial^2 f}{\partial x_n^2}\end{matrix}\right)$$ which is indeed $n\times n$.

answered May 25, 2014 at 20:43

Dan

1,79912 silver badges13 bronze badges

1

$\begingroup$ Why do you take the transpose of ∂f/∂x_1? $\endgroup$
– dot_zero
Commented May 4, 2017 at 5:32
$\begingroup$ @David It could just be a notation issue. $\left[ \frac{\partial f}{\partial x_1} \right]^T$ must be a column vector where each entry is $\frac{\partial f}{\partial x_1}$. This way, when the derivative operator is applied again, it results in the first column of the Hessian matrix. At least that's how I interpreted the original notation. Maybe it should be written as $ \frac{\partial f}{\partial x_1} \mathbf{1}$ instead. $\endgroup$
– yjc
Commented Jun 24, 2017 at 23:08
$\begingroup$ @dot_zero there, the transpose is being applied to a row in order to make it a column. For example, the first row is written as $D[\frac{\partial f}{\partial x_{1}}],$ so the transpose gives the first column of $D^{2}[f(x)].$ $\endgroup$
– sunspots
Commented Feb 22, 2023 at 2:59
$\begingroup$ One note is that the notation mixes between tuples and matrices. $\endgroup$
– sunspots
Commented Feb 22, 2023 at 3:01

Add a comment |

MathArt · Accepted Answer · 2023-06-15 13:47:10Z

This is an old discussion and I would not intend to correct anything. Just keep in mind this had caused myself long-time confusion so I wish to accept some simple rules to make it clear.

If one allows to define the operators of the function $f(x):\mathbb{R}^n\rightarrow\mathbb{R}$:

Gradient operator $\nabla$: defined as $n\times1$ column vector.
Derivative operator $\nabla^T$: defined as a row vector (i.e., $1\times n$),
Hessian operator $\mathbf{H}$: defined as the gradient of the derivative of $f(x)$. $$\mathbf{H}=\left(\begin{matrix}\frac{\partial}{\partial x_1}\\ \vdots\\\frac{\partial}{\partial x_n}\end{matrix}\right)\left(\begin{matrix}\frac{\partial}{\partial x_1}& \ldots&\frac{\partial}{\partial x_n}\end{matrix}\right)=\left(\begin{matrix}\frac{\partial}{\partial x_1^2} & \ldots & \frac{\partial^2}{\partial x_1\partial x_n}\\\vdots &\ddots&\vdots \\ \frac{\partial^2}{\partial x_n\partial x_1}& \ldots & \frac{\partial^2}{\partial x_n^2}\end{matrix}\right).$$

Stack Exchange Network

Hessian matrix as derivative of gradient

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
matrices
derivatives
definition
partial-derivative
.

Linked

Hot Network Questions

Hessian matrix as derivative of gradient

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged matricesderivativesdefinitionpartial-derivative.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
matrices
derivatives
definition
partial-derivative
.