6
$\begingroup$

Let $\mathbf{S}$ be symmetric positive semidefinite matrix (i.e. one with all eigenvalues real and non-negative). Then there is an orthogonal matrix $\mathbf{U}$ (with its columns forming an orthonormal basis) such that $\mathbf{U}^\top \mathbf{S} \mathbf{U}$ is diagonal; this basis is of course given by eigenvectors of $\mathbf{S}$.

Consider another basis $\mathbf{V}$ consisting of unit-length but non-orthogonal vectors (so columns of $\mathbf{V}$ have unit length but are not orthogonal) that also diagonalizes $\mathbf{S}$, i.e. $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ is diagonal.

I suspect that the following is true: $\mathrm{Tr}(\mathbf{V}^\top \mathbf{S} \mathbf{V}) \le \mathrm{Tr}(\mathbf{S})=\mathrm{Tr}(\mathbf{U}^\top \mathbf{S} \mathbf{U})$. Is it true? If so, how can it be proved?

Furthermore, is it true that the equality is reached iff V is orthogonal?

Update: Following some confusion in the comments, I would like to clarify that I am considering $\mathbf{S}$ to represent a bilinear form, not a linear form. So with a change of basis it is transformed as $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ and not as $\mathbf{V}^{-1} \mathbf{S} \mathbf{V}$.


Update 2

Let me illustrate where this question comes from; it might provide some additional intuition. $\mathbf{S}$ is actually a covariance matrix of some data (i.e. I have a set of data points $\mathbf{x}_i \in \mathbb{R}^N$, and $\mathbf{S} = \sum_i \mathbf{x}_i \mathbf{x}_i^\top$, up to a constant factor). Trace of $\mathbf{S}$ is total variance of the data, and it of course stays the same if coordinate system is rotated. Now for any unit vector $\mathbf{v}$, variance of the projection of the data on the axis defined by this vector is equal to $\mathbf{v}^\top\mathbf{S}\mathbf{v}$. If I take $N$ orthogonal unit vectors, then sum of these variances is equal to the total variance. I am interested in the situation when I take $N$ non-orthogonal unit vectors, but they are chosen such that all projections of the data on these vectors have zero correlation (or covariance). This condition is equivalent to $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ being diagonal. This means that my projections are "independent"; therefore I am pretty sure that their variances together cannot exceed total variance; total variance should give maximum amount of variance that can be "distributed" between independent components (with maximum being achieved with principal components).

$\endgroup$
11
  • 1
    $\begingroup$ How do you define the trace? $\endgroup$ Commented Jul 24, 2014 at 11:27
  • 2
    $\begingroup$ @user161825: Well, as a sum of diagonal elements. Are there other definitions? $\endgroup$
    – amoeba
    Commented Jul 24, 2014 at 11:29
  • 1
    $\begingroup$ The sum of diagonal elements is not invariant with respect to a change of basis. Do you mean the sum in any orthonormal basis? $\endgroup$ Commented Jul 24, 2014 at 11:31
  • 2
    $\begingroup$ I think the OP wants $V$ to digonalize $S$ as a bilinear form, not as a linear form $\endgroup$
    – user126154
    Commented Jul 24, 2014 at 11:35
  • 1
    $\begingroup$ @Omnomnomnom: I imagine that $S$ is a matrix of bilinear form. When one changes basis, matrix of bilinear form is transformed via $V^\top SV$, see e.g. here. $\endgroup$
    – amoeba
    Commented Jul 24, 2014 at 11:36

1 Answer 1

5
$\begingroup$

Denote scalar product of vectors $v,u$ by $(v,u)$, norm of vector $v$ by $\|v\|=\sqrt{(v,v)}$.

Lemma 1. Let $A$ be a symmetric positive operator on $\mathbb{R}^n$, $f\in \mathbb{R}^n$ be a vector. Then $(Af,f)\cdot (A^{-1} f,f)\geq \|f\|^4$.

Proof. Let $A=B^2$, where $B=\sqrt{A}$ is positive. Then $(Af,f)=(B^2f,f)=(Bf,Bf)=\| Bf\|^2$, $(A^{-1}f,f)=\|B^{-1}f\|^2$ and we have to prove $\|Bf\|\cdot \|B^{-1} f\|\geq \|f\|^2$, but, by the Cauchy-Schwarz inequality, $\|Bf\|\cdot \|B^{-1} f\| \geq (Bf, B^{-1} f)=(f,f)=\|f\|^2$, as desired.

Lemma 2. If for some positive symmetric matrix $A$ diagonal elements are equal to 1, then for $A^{-1}$ diagonal elements are not less then 1.

Proof. Apply Lemma 1 to basis vectors.

Now assume that $U:=V^TSV={\rm diag}(c_1,\dots,c_n)$. Then ${\rm tr}\, V^T SV=\sum c_i$, $$ {\rm tr}\, S={\rm tr}\, V^TS(V^T)^{-1}={\rm tr}\, V^TSV (V^T V)^{-1}={\rm tr}\, U F^{-1}, $$ where $F=V^TV$ is a symmetric positive matrix with unit diagonal elements. So, by Lemma 2 diagonal elements $w_1,\dots,w_n$ of $F^{-1}$ are not less than $1$ and so ${\rm tr}\, S=\sum c_i w_i\geq \sum c_i={\rm tr}\, U$.

$\endgroup$
3
  • 1
    $\begingroup$ As a funny side note, $\mbox{tr }S^{-1}=\mbox{tr } U^{-1} V^T V=\mbox{tr }U^{-1}$. I think the exponent in Lemma 1 should be $4$ rather than $2$. $\endgroup$ Commented Jul 29, 2014 at 10:17
  • $\begingroup$ @user161825: I am afraid I screwed up the exponent when making my minor edits; Fedor had the correct value of 4 there. Hope he will correct the exponent back to 4, as I feel bad messing up with his text even more! $\endgroup$
    – amoeba
    Commented Jul 29, 2014 at 11:54
  • 1
    $\begingroup$ Small footnote. Slight variation of the argument in the main proof, making it a bit more straightforward: $U=V^\top SV \Rightarrow S=V^{-\top}UV^{-1}$. From this we get that $\mathrm{tr} S = \mathrm{tr} V^{-\top}UV^{-1} = \mathrm{tr} (V^{\top}V)^{-1}U = \mathrm{tr} F^{-1}U$. $\endgroup$
    – amoeba
    Commented Jul 29, 2014 at 23:21

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .