Largest $k$ such that $k \operatorname{Tr} H^2 C \le (\operatorname{Tr} H C)^2$

Question

Suppose we are given $H$, a positive semidefinite $d\times d$matrix. How do I find $k$, the largest $k'$ such that the following holds for all positive semidefinite $C$ with unit trace?

$$k' \operatorname{Tr} H^2 C \le (\operatorname{Tr} H C)^2$$

Earlier discussion shows that $k\ge\frac{\lambda_\text{min}(H)^2}{\lambda_\text{max}(H)^2}$. For the example below, this gives $k \ge 0.01$ where true value is $k\approx 0.330579$. Is there a better bound with a nice closed-form expression?

We can use numerical optimization to find $k$ for specific $H$ like this. For example, suppose

$$\text{H=}\left( \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 10 \\ \end{array} \right)$$

Then $$k = \frac{40}{121}$$

Realized by the following $C^*$

$$C^*=\left( \begin{array}{cccc} \frac{39}{176} & 0 & 0 & 0 \\ 0 & \frac{7}{16} & 0 & 0 \\ 0 & 0 & \frac{1}{4} & 0 \\ 0 & 0 & 0 & \frac{1}{11} \\ \end{array} \right)$$

Notebook

Motivation: $k\times d$ gives "critical batch-size" of minibatch SGD for solving linear least-squares problem with Hessian $H$ and optimal error distribution $C=C^*$. Critical batch-size for isotropic error distribution $C=I$ is analyzed here.

Frederik vom Ende · Accepted Answer · 2023-02-19 08:41:06Z

I played around a bit with your nice Mathematica code and it turns out that, indeed, if $H\neq 0$ the constant you are looking for does have a nice analytic expression: $$ \boxed{k'=\frac{4\lambda_{\rm min}(H)\lambda_{\rm max}(H)}{(\lambda_{\rm min}(H)+\lambda_{\rm max}(H))^2}} $$ Proving this amounts to showing that (1.) this $k'$ satisfies your inequality, that is, $$ \frac{4\lambda_{\rm min}(H)\lambda_{\rm max}(H)}{(\lambda_{\rm min}(H)+\lambda_{\rm max}(H))^2}\operatorname{tr} (H^2 C) \le (\operatorname{tr} (H C))^2\tag{1} $$ for all $C$ positive semi-definite of unit trace, and (2.) no larger $k'$ satisfies the inequality in question for all $C$.

Because $H$ is positive semi-definite we can spectrally decompose it as $\sum_{j=1}^n\lambda_j(H)|g_j\rangle\langle g_j|$ where $(\lambda_{\rm min}(H)=)\lambda_1(H)\leq\ldots\leq\lambda_n(H)(=\lambda_{\rm max}(H))$ and $(g_j)_{j=1}^n$ is an orthonormal basis of $\mathbb C^n$. With this---because $H\neq 0$ so $\lambda_{\rm max}(H)>0$---Eq. (1) is equivalent to $$ 4\lambda_{\rm min}(H)\lambda_{\rm max}(H)\sum_{j=1}^n\lambda_j^2\langle g_j,C g_j\rangle \le (\lambda_{\rm min}(H)+\lambda_{\rm max}(H))^2\Big(\sum_{j=1}^n\lambda_j\langle g_j,C g_j\rangle\Big)^2\,. $$ Note that only the diagonal elements of $C$ (in some basis) enter this inequality, meaning we can replace it by a probability vector (of diagonal elements) $r\in\mathbb R^n$: $C\geq 0$ implies $r\geq 0$ and $\operatorname{tr}(C)=1$ implies $\sum_{j=1}^nr_j=1$. With this we arrive at the following equivalent statement:

Lemma. Let $n\in\mathbb N$ and $\lambda\in\mathbb R_+^n$ be given. If $\lambda_1\leq\ldots\leq\lambda_n$, then the map \begin{align*} f:\Delta^{n-1}&\to\mathbb R\\ r&\mapsto (\lambda_1+\lambda_n)^2\langle\lambda,r\rangle^2-4\lambda_1\lambda_n\langle\lambda^2,r\rangle \end{align*} only takes non-negative values. Here, $\lambda^2:=(\lambda_j^2)_{j=1}^n$, and $\Delta^{n-1}$ is the standard simplex, that is, the set of all probability vectors in $\mathbb R^n$.

While I'm positive there exists an elementary proof for this, I only managed to show this using basic calculus:

Proof. Obviously, $f$ is non-negative iff the minimal value $f$ takes is non-negative. With this slight re-formulation the central idea is that a differentiable function on a compact domain takes its minimum either on the boundary or at a critical point. However, every $r\in\Delta^{n-1}$ is a boundary point (w.r.t. $\mathbb R^n$) because the trace hyperplane $\{r\in\mathbb R^n:r_1+\ldots+r_n=1\}$ has dimension one less than $\mathbb R^n$. Therefore we have to slightly tweak $f$: \begin{align*} \tilde f:\Delta^{n-2}_0&\to\mathbb R\\ r&\mapsto f\begin{pmatrix}r\\1-r_1-\ldots-r_n\end{pmatrix} \end{align*} Here, $\Delta^{n-2}_0:=\operatorname{conv}\{\Delta^{n-2}\cup\{0\}\}=\{r\in\mathbb R^{n-1}_+:r_1+\ldots+r_n\leq1\}$ is a $n-1$-dimensional subset of $\mathbb R^{n-1}$. By definition, $\min_{x\in\Delta^{n-2}_0}\tilde f(x)=\min_{x\in\Delta^{n-1}}f(x)$ and this minimum exists because we are dealing with a continuous function on a compact domain. Thus if we can show $\min_{x\in\Delta^{n-2}_0}\tilde f(x)\geq 0$ for all $n\in\mathbb N\setminus\{1\}$ and all $\lambda\in\mathbb R_+^n$ then we are done. We do so via induction over $n$.

$n=2$: Let $x\in\Delta^{0}_0=[0,1]$ and $\lambda\in\mathbb R_+^2$. Then by direct computation $$\tilde f(x)=f\begin{pmatrix}x\\1-x\end{pmatrix}=(\lambda_2-\lambda_1)^2\big(\lambda_1x-\lambda_2(1-x)\big)^2\geq 0 $$

$n\to n+1$: We will distinguish three cases. (1.) $\lambda_1=0$. Then $f(r)=\lambda_n^2\langle\lambda,r\rangle^2\geq 0$. (2.) $\lambda$ has at most two distinct entries, that is, there exist $0\leq \alpha\leq\beta$ such that $\lambda=(\alpha,\ldots,\alpha,\beta,\ldots,\beta)^\top$. Then for all $r\in\Delta^{n-1}$ there exists $\xi\in[0,1]$ such that $f(r)=(\alpha-\beta)^2(r\alpha-(1-r)\beta)^2\geq 0$. (3.) $\lambda_1>0$ and $\lambda$ has at least three distinct entries, that is, there exists $k\in\{2,\ldots,n\}$ such that $0<\lambda_1<\lambda_k<\lambda_{n+1}$. Because $\tilde f$ is differentiable and has compact domain, by the above argument we know that $\tilde f$ takes its minimum either on the boundary or at a critical point. For the boundary observe that if some entries of $r$ are zero, then $f(r)=:f_{n+1}(\lambda,r)$ equals $f_m(\lambda',r')$ for some $m<n+1$ by definition; but $f_m$ is non-negative by induction hypothesis. Thus all that is left is to compute the critical points of $f$; we claim that assumption (3.) implies that $f$ has no critical points! Defining ${\bf e}:=(1,\ldots,1)^\top\in\mathbb R^n$ we for any $j=1,\ldots,n$ compute \begin{align*} \frac{\partial \tilde f}{\partial r_j}(r)&=\frac{\partial}{\partial r_j}\Big( (\lambda_1+\lambda_{n+1})^2\Big(\sum_{j=1}^n\lambda_jr_j+\lambda_{n+1}(1-\langle {\bf e},r\rangle\Big)^2\\ &\qquad\qquad\qquad-4\lambda_1\lambda_{n+1} \Big(\sum_{j=1}^n\lambda_j^2r_j+\lambda_{n+1}^2(1-\langle {\bf e},r\rangle\Big)\Big)\\ &=2(\lambda_1+\lambda_{n+1})^2\Big(\sum_{j=1}^n\lambda_jr_j+\lambda_{n+1}(1-\langle {\bf e},r\rangle\Big)(\lambda_j-\lambda_{n+1})\\ &\qquad\qquad\qquad-4\lambda_1\lambda_{n+1} (\lambda_j^2-\lambda_{n+1}^2)\\ &=2(\lambda_{n+1}-\lambda_j)\Big(- (\lambda_1+\lambda_{n+1})^2\Big(\sum_{j=1}^n\lambda_jr_j+\lambda_{n+1}(1-\langle {\bf e},r\rangle\Big)\\ &\qquad\qquad\qquad+2\lambda_1\lambda_{n+1} (\lambda_j+\lambda_{n+1}) \Big)\,. \end{align*} Now if $r^*$ is a critical point of $\tilde f$, then in particular $ \frac{\partial f}{\partial r_1}(r^*)=0=\frac{\partial f}{\partial r_k}(r^*) $. Because $\lambda_{n+1}>\lambda_k>\lambda_1>0$, this is equivalent to $$ \lambda_1+\lambda_{n+1} =\frac{(\lambda_1+\lambda_{n+1})^2}{2\lambda_1\lambda_{n+1}}\Big(\sum_{j=1}^n\lambda_jr_j^*+\lambda_{n+1}(1-\langle {\bf e},r^*\rangle\Big) $$ and $$ \lambda_k+\lambda_{n+1} =\frac{(\lambda_1+\lambda_{n+1})^2}{2\lambda_1\lambda_{n+1} }\Big(\sum_{j=1}^n\lambda_jr_j^*+\lambda_{n+1}(1-\langle {\bf e},r^*\rangle\Big)\,. $$ Observe that the r.h.s. is the same both times, but the left-hand sides do not coincide because $\lambda_1<\lambda_k$ by assumption. Therefore $\tilde f$ has no critical points meaning the minimum is taken on the boundary (recall that this is taken care of by the induction hypothesis). This concludes the proof. $\square$

It suffices to write down an explicit $C$ for which equality holds in (1). If $g_{\rm min}$ ($g_{\rm max}$) is any normalized eigenvector of $H$ w.r.t. the eigenvalue $\lambda_{\rm min}(H)$ ($\lambda_{\rm max}(H)$), then such $C$ is for example given by the following rank-2 state: $$ \boxed{\frac{\lambda_{\rm max}(H)}{\lambda_{\rm min}(H)+\lambda_{\rm max}(H)}|g_{\rm min}\rangle\langle g_{\rm min}|+\frac{\lambda_{\rm min}(H)}{\lambda_{\rm min}(H)+\lambda_{\rm max}(H)}|g_{\rm max}\rangle\langle g_{\rm max}|} $$ Pluigging this $C$ into (1) yields equality so the $k'$ we specified cannot be chosen any lower; this concludes the proof.

Two little remarks:

It is easy to see that this $k'$ is better than the bound from your previous question: \begin{align*} \frac{4\lambda_{\rm min}(H)\lambda_{\rm max}(H)}{(\lambda_{\rm min}(H)+\lambda_{\rm max}(H))^2}&\geq \frac{4\lambda_{\rm min}^2(H)}{(\lambda_{\rm min}(H)+\lambda_{\rm max}(H))^2}\\ &\geq \frac{4\lambda_{\rm min}^2(H)}{(\lambda_{\rm max}(H)+\lambda_{\rm max}(H))^2}=\frac{\lambda_{\rm min}(H)^2}{\lambda_{\rm max}(H)^2}\,, \end{align*} and equality holds if and only if $\lambda_{\rm min}(H)=0$ or $H$ is a multiple of the identity.
Some slight algebraic manipulation shows that if $H>0$, then $k'$ is the ratio of the harmonic mean to the arithmetic mean of the smallest and the largest eigenvalue of $H$: $$ k'=\frac{ \Big(\frac{ 2 }{ \frac{1}{\lambda_{\rm min}(H)}+\frac{1}{\lambda_{\rm max}(H)} } \Big)}{ \Big(\frac{\lambda_{\rm min}(H)+\lambda_{\rm max}(H) }{ 2 }\Big) } $$ As a direct consequence of the HM-AM-inequality we find $k'\leq 1$, meaning for $H\geq 0$ and all $\tilde k>1$ there exists $C\geq 0$ of unit trace such that $\tilde k\operatorname{tr}(H^2C)\not\leq(\operatorname{tr}(HC))^2$. Indeed if $\psi_j\in\mathbb C^n$ is some normalized eigenvector to a non-zero eigenvalue $\lambda_j$ of $H$, then $C:=|\psi_j\rangle\langle\psi_j|$ does the job: $\tilde k\operatorname{tr}(H^2C)=\tilde k \lambda_j^2 >\lambda_j^2=(\operatorname{tr}(HC))^2$.

Nice! I figured this would be piece of cake for someone from Quantum Information :) — Yaroslav Bulatov, Commented Feb 17, 2023 at 21:01
BTW, in version 13 the code is much simpler: h = {1, 1, 1, 10}; Minimize[(h . c)^2/((h*h) . c), c \[Element] Simplex@IdentityMatrix@Length[h]] — Yaroslav Bulatov, Commented Feb 17, 2023 at 21:04

user1551 · Accepted Answer · 2023-02-20 06:47:56Z

As shown in the other answer, the result is closely related to Kantorovich-typed inequalities.

Assume for the moment that $H$ is positive definite. Let $M$ and $m$ be its largest and smallest eigenvalues respectively. Then $MI-H$ and $H-mI$ are two commuting positive semidefinite matrices. Hence $0\le(MI-H)(H-mI)$, i.e., $H^2\le(M+m)H-MmI$. It follows that $C^{1/2}H^2C^{1/2}\le(M+m)C^{1/2}HC^{1/2}-MmC$ and in turn, $\operatorname{tr}H^2C\le(M+m)\operatorname{tr}HC-Mm$. So, by completing square for the RHS, we obtain $$ \operatorname{tr}H^2C \le(M+m)\operatorname{tr}HC-Mm \le\frac{(M+m)^2}{4Mm}(\operatorname{tr}HC)^2. $$ Therefore $$ 4Mm\operatorname{tr}H^2C \le(M+m)^2(\operatorname{tr}HC)^2 $$ and this inequality still holds when $H$ is positive semidefinite, by a continuity argument. It is also sharp. Equality occurs, for instance, when $H$ is a scalar matrix.

Stack Exchange Network

Largest $k$ such that $k \operatorname{Tr} H^2 C \le (\operatorname{Tr} H C)^2$

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linear-algebra
inequality
convex-optimization
trace
cauchy-schwarz-inequality
.

Linked

Hot Network Questions

Largest $k$ such that $k \operatorname{Tr} H^2 C \le (\operatorname{Tr} H C)^2$

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linear-algebrainequalityconvex-optimizationtracecauchy-schwarz-inequality.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linear-algebra
inequality
convex-optimization
trace
cauchy-schwarz-inequality
.