Let $\mathbf{S}$ be symmetric positive semidefinite matrix (i.e. one with all eigenvalues real and non-negative). Then there is an orthogonal matrix $\mathbf{U}$ (with its columns forming an orthonormal basis) such that $\mathbf{U}^\top \mathbf{S} \mathbf{U}$ is diagonal; this basis is of course given by eigenvectors of $\mathbf{S}$.
Consider another basis $\mathbf{V}$ consisting of unit-length but non-orthogonal vectors (so columns of $\mathbf{V}$ have unit length but are not orthogonal) that also diagonalizes $\mathbf{S}$, i.e. $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ is diagonal.
I suspect that the following is true: $\mathrm{Tr}(\mathbf{V}^\top \mathbf{S} \mathbf{V}) \le \mathrm{Tr}(\mathbf{S})=\mathrm{Tr}(\mathbf{U}^\top \mathbf{S} \mathbf{U})$. Is it true? If so, how can it be proved?
Furthermore, is it true that the equality is reached iff V is orthogonal?
Update: Following some confusion in the comments, I would like to clarify that I am considering $\mathbf{S}$ to represent a bilinear form, not a linear form. So with a change of basis it is transformed as $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ and not as $\mathbf{V}^{-1} \mathbf{S} \mathbf{V}$.
Update 2
Let me illustrate where this question comes from; it might provide some additional intuition. $\mathbf{S}$ is actually a covariance matrix of some data (i.e. I have a set of data points $\mathbf{x}_i \in \mathbb{R}^N$, and $\mathbf{S} = \sum_i \mathbf{x}_i \mathbf{x}_i^\top$, up to a constant factor). Trace of $\mathbf{S}$ is total variance of the data, and it of course stays the same if coordinate system is rotated. Now for any unit vector $\mathbf{v}$, variance of the projection of the data on the axis defined by this vector is equal to $\mathbf{v}^\top\mathbf{S}\mathbf{v}$. If I take $N$ orthogonal unit vectors, then sum of these variances is equal to the total variance. I am interested in the situation when I take $N$ non-orthogonal unit vectors, but they are chosen such that all projections of the data on these vectors have zero correlation (or covariance). This condition is equivalent to $\mathbf{V}^\top \mathbf{S} \mathbf{V}$ being diagonal. This means that my projections are "independent"; therefore I am pretty sure that their variances together cannot exceed total variance; total variance should give maximum amount of variance that can be "distributed" between independent components (with maximum being achieved with principal components).