6
$\begingroup$

Let $U$ be an open convex subset of $\mathbb R^n$ and $f:U\to\mathbb R$ a convex function on it.

  • It is a well-known fact that if the second partial derivatives exist everywhere on $U$ and are all continuous (i.e., if $f\in\mathcal C^2$), then the Hessian of $f$ is symmetric, that is, $\partial^2 f/(\partial x_i\partial x_j)=\partial^2 f/(\partial x_j\partial x_i)$ for any $i,j\in\{1,\ldots,n\}$. (Actually, $f$ needn't even be convex for this result.)
  • In fact, Alexandroff's theorem states that the Hessian exists and is symmetric almost everywhere with respect to the $n$-dimensional Lebesgue measure, without any additional assumptions beyond convexity.

Question: It is possible for $f$ to be twice differentiable (and thus have, not necessarily everywhere-continuous, second-order partial derivatives) everywhere on $U$ but a Hessian that is not symmetric at some $x\in U$?


Update: Dudley (1977) gives an example of a convex function with an existent and asymmetric Hessian at the origin. This counterexample doesn't settle my question, however, because Dudley's function doesn't have a second-order (Fréchet) derivative (i.e., not twice differentiable) at the origin (even though the second-order partial derivatives exist). I would like to see a convex function with both an existent second-order Fréchet derivative and with asymmetric Hessian at some point (which necessarily implies that some of the second-order partial derivatives are discontinuous at that point).

$\endgroup$

1 Answer 1

10
$\begingroup$

It turns out that twice-differentiability implies that the Hessian is symmetric even without convexity and with no reference to whether the second-order partial derivatives are continuous! The proof below is based on Theorem 8.12.2 in the book Foundations of Modern Analysis by Dieudonné (1969, p. 180).

Claim: Let $U\subseteq\mathbb R^n$ be an open set and $f:U\to\mathbb R$ a function. Suppose that $f$ is (Fréchet) differentiable on $U$ and that it is twice (Fréchet) differentiable at $\mathbf x_0\in U$. Then, the Hessian matrix $\mathbf H(\mathbf x_0)$ at $\mathbf x_0$ is symmetric.

Proof: Let $\mathbf D:U\to\mathbb R^n$ denote the gradient function of $f$. Fix $\varepsilon>0$. Since $\mathbf D$ is Fréchet differentiable at $\mathbf x_0$ by assumption, it follows that there exists some $\delta>0$ such that $\|\mathbf v\|<4\delta$ implies that $$\left\|\mathbf D(\mathbf x_0+\mathbf v)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot\mathbf v\right\|\leq\varepsilon\|\mathbf v\|.$$ There is no loss of generality in taking $\delta$ to be so small that the open ball $B(4\delta,\mathbf x_0)$ is contained in the open set $U$.

For any $i,j\in\{1,\ldots,n\}$, let $\mathbf e_i$ and $\mathbf e_j$ be the corresponding standard basis vectors of unit length. Let $\mathbf s\equiv\delta\mathbf e_i$ and $\mathbf t\equiv\delta\mathbf e_j$. It is clear that $\mathbf x_0+\xi\mathbf s+\mathbf t$ and $\mathbf x_0+\xi\mathbf s$ are both in $U$ whenever $\xi\in[0,1]$; this is because $\|\xi\mathbf s+\mathbf t\|<4\delta$ and $\|\xi\mathbf s\|<4\delta$. Define the following function $g:[0,1]\to\mathbb R$: $$g(\xi)\equiv f(\mathbf x_0+\xi\mathbf s+\mathbf t)-f(\mathbf x_0+\xi\mathbf s)\quad\forall\xi\in[0,1].$$

Clearly, $g$ is continuous on $[0,1]$ and differentiable on $(0,1)$. Lagrange's mean-value theorem, in turn, implies that there exists some $\xi\in(0,1)$ such that $$g(1)-g(0)=g'(\xi)=\mathbf s\cdot\left[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0+\xi\mathbf s)\right],$$ using the chain rule.

Next, one can derive the following chain of inequalities (the first one uses the Cauchy–Schwarz inequality): \begin{align*} &\left|g(1)-g(0)-\mathbf s\cdot\mathbf H(\mathbf x_0)\cdot\mathbf t\right|\leq\underbrace{\|\mathbf s\|}_{=\delta}\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)]-\mathbf H(\mathbf x_0)\cdot\mathbf t\right\|\\ =&\,\delta\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s+\mathbf t)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s)]\right\|\\ \leq&\,\delta\varepsilon\left(\|\xi\mathbf s+\mathbf t\|+\|\xi\mathbf s\|\right)<8\delta^2\varepsilon. \end{align*} That is, one has that $$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j|<8\delta^2\varepsilon,$$ and, by a completely analogous and symmetric reasoning in which $\mathbf s$ and $\mathbf t$ are interchanged, $$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_j\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_i|<8\delta^2\varepsilon.$$ Given that $\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j=h_{ij}(\mathbf x_0)\equiv\partial^2 f/(\partial x_i\partial x_j)(\mathbf x_0)$, the preceding two inequalities imply that $$\left|h_{ij}(\mathbf x_0)-h_{ji}(\mathbf x_0)\right|<16\varepsilon.$$ Taking $\varepsilon$ to be arbitrarily small, one sees that $h_{ij}(\mathbf x_0)=h_{ji}(\mathbf x_0)$. $\blacksquare$

$\endgroup$
9
  • 1
    $\begingroup$ Thanks for this excellent exposition of Dieudonne's theorem. This is quite a slick proof. I didn't understand the proof in Dieudonne's real analysis book at first glance: this answer made everything quite clear. +1. $\endgroup$ Commented Feb 12, 2016 at 20:55
  • $\begingroup$ Great answer. FYI Theorem 12.12 in Mathematical Analysis by Apostol gives a result that (I think) is slightly stronger. The upshot is that Frechet differentiability of the first order partials is sufficient. $\endgroup$
    – JasonJones
    Commented Nov 16, 2020 at 23:58
  • $\begingroup$ Thanks a lot for this clear answer! May I ask a silly question as to why one would need Schwarz's Theorem then? $\endgroup$
    – oliver
    Commented Sep 23, 2022 at 9:20
  • $\begingroup$ @oliver My understanding (correct me if I’m wrong) is that Schwarz’s theorem is not nested: its premise requires (i) only that the second-order partial derivatives exist (without the original function necessarily being twice Fréchet differentiable), a weaker condition; but (ii) also that the second-order partial derivatives be continuous, a stronger condition. Schwarz’s theorem is more relevant because its premises are more naturally satisfied. Dieudonné’s theorem covers the somewhat pathological cases, which are more interesting from a theoretical rather than a practical point of view. $\endgroup$
    – triple_sec
    Commented Sep 23, 2022 at 17:38
  • $\begingroup$ @triple_sec: doenst existence of second partials imply continuity of first partials and therefore frechet? and then continuity of second partials, which are first partials of frechet derivative, twice frechet differentiability? so isnt schwarz strictly a weaker form of dieudonne (ie implied by the latter)? $\endgroup$
    – peter
    Commented May 26 at 18:52

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .