209
$\begingroup$

I'm trying to intuitively understand the difference between SVD and eigendecomposition.

From my understanding, eigendecomposition seeks to describe a linear transformation as a sequence of three basic operations ($P^{-1}DP$) on a vector:

  1. Rotation of the coordinate system (change of basis): $P$
  2. Independent scaling along each basis vector (of the rotated system): $D$
  3. De-rotation of the coordinate system (undo change of basis): $P^{-1}$

But as far as I can see, SVD's goal is to do exactly the same thing, except that resulting decomposition is somehow different.

What, then, is the conceptual difference between the two?

For example:

  • Is one of them more general than the other?
  • Is either a special case of the other?

Note: I'm specifically looking for an intuitive explanation, not a mathematical one.
Wikipedia is already excellent at explaining the mathematical relationship between the two decompositions ("The right-singular vectors of M are eigenvectors of $M^*M$", for example), but it completely fails to give me any intuitive understanding of what is going on intuitively.

The best explanation I've found so far is this one, which is great, except it doesn't talk about eigendecompositions at all, which leaves me confused as to how SVD is any different from eigendecomposition in its goal.

$\endgroup$
2
  • 4
    $\begingroup$ Haven't seen this perspective pushed before, but you can view eigendecomposition as a special case of SVD. In particular, SVD is an isomorphism (between vector spaces of varying dimension), while spectral decomposition is an automorphism (between vector spaces of the same dimension) $\endgroup$ Commented Mar 19, 2020 at 16:47
  • $\begingroup$ EVD (EigenValue Decomposition) is for square matrix only, while SVD is more general than EVD $\endgroup$
    – JeeyCi
    Commented Jan 1 at 17:07

8 Answers 8

138
$\begingroup$

Consider the eigendecomposition $A=P D P^{-1}$ and SVD $A=U \Sigma V^*$. Some key differences are as follows,

  • The vectors in the eigendecomposition matrix $P$ are not necessarily orthogonal, so the change of basis isn't a simple rotation. On the other hand, the vectors in the matrices $U$ and $V$ in the SVD are orthonormal, so they do represent rotations (and possibly flips).
  • In the SVD, the nondiagonal matrices $U$ and $V$ are not necessairily the inverse of one another. They are usually not related to each other at all. In the eigendecomposition the nondiagonal matrices $P$ and $P^{-1}$ are inverses of each other.
  • In the SVD the entries in the diagonal matrix $\Sigma$ are all real and nonnegative. In the eigendecomposition, the entries of $D$ can be any complex number - negative, positive, imaginary, whatever.
  • The SVD always exists for any sort of rectangular or square matrix, whereas the eigendecomposition can only exists for square matrices, and even among square matrices sometimes it doesn't exist.
$\endgroup$
2
  • 2
    $\begingroup$ +1 Hmm, so if the change of basis performed by $P$ isn't a simple rotation, then what else can it be? The scaling is already taken care of by $D$, right? I think an example might be helpful for that one. $\endgroup$
    – user541686
    Commented Mar 4, 2013 at 7:10
  • 2
    $\begingroup$ If $P=\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}$, then the action of $P$ on a vector is to form a linear combination of the vectors $\begin{bmatrix}1 \\ 0\end{bmatrix}$ and $\begin{bmatrix}1 \\ 1\end{bmatrix}$, which make a 45 degree angle. The action of $P^{-1}$ is to express the vector acted on in the basis of those vectors. $\endgroup$
    – Nick Alger
    Commented Mar 4, 2013 at 7:27
80
$\begingroup$

I encourage you to see an $(m \times n)$ real-valued matrix $A$ as a bilinear operator between two spaces; intuitively, one space lies to the left ($R^m$) and the other ($R^n$) to the right of $A$. "Bilinear" simply means that $A$ is linear in both directions (left to right or right to left). The operations $A$ can perform are limited to scaling, rotation, and reflection, and combinations of these; any other kind of operation is non-linear.

$A$ transforms vectors between the two spaces via multiplication:

$x^T$ A = $y^T$ transforms left vector $x$ to right vector $y$.

$x = A y$ transforms right vector $y$ to left vector $x$.

The point of decompositions of $A$ is to identify, or highlight, aspects of the action of $A$ as an operator. The eigendecomposition of $A$ clarifies what $A$ does by finding the eigenvalues and eigenvectors that satisfy the constraint

$A x = \lambda x$.

This constraint identifies vectors (directions) $x$ that are not rotated by $A$, and the scalars $\lambda$ associated with each of those directions.

The problem with eigendecomposition is that when the matrix isn't square, the left and right space are spaces of vectors of different sizes and therefore completely different spaces; there really isn't a sense in which $A$'s action can be described as involving a "rotation", because the left and right spaces are not "oriented" relative to one another. There just isn't a way to generalize the notion of an eigendecomposition to a non-square matrix $A$.

Singular vectors provide a different way to identify vectors for which the action of $A$ is simple; one that does generalize to the case where the left and right spaces are different. A corresponding pair of singular vectors have a scalar $\sigma$ for which $A$ scales by the same amount, whether transforming from the left space to the right space or vice-versa:

$ x^T A = \sigma y^T$

$\sigma x = A y$.

Thus, eigendecomposition represents $A$ in terms of how it scales vectors it doesn't rotate, while singular value decomposition represents $A$ in terms of corresponding vectors that are scaled the same, whether moving from the left to the right space or vice-versa. When the left and right space are the same (i.e. when $A$ is square), singular value decomposition represents $A$ in terms of how it rotates and reflects vectors that $A$ and $A^T$ scale by the same amount.

$\endgroup$
3
  • 3
    $\begingroup$ thanks! really intuitive. Can you please delete your original answer part? It is just a source of confusion... $\endgroup$
    – ihadanny
    Commented Feb 20, 2016 at 22:21
  • 3
    $\begingroup$ Great intuition! This actually answers the question as I understood it. $\endgroup$
    – Felix D.
    Commented Oct 6, 2019 at 8:48
  • 3
    $\begingroup$ This answer makes the most intuitive sense to me. Thanks! $\endgroup$
    – xflowXen
    Commented Feb 3, 2020 at 10:48
64
$\begingroup$

(I'm coming back to update my answer here, because in hindsight I feel the existing ones don't quite explain the details satisfactorily, and I feel fleshing out the natural follow-ups is helpful.)


###Singular Value Decomposition There was a clear explanation for the singular value decomposition here, which I annotate here for clarity:

The SVD allows to describe the effect of a matrix $A$ on a vector (via the matrix-vector product), as a three-step process $A = U \Sigma V^\dagger$:

  1. A first rotation in the input space ($V$)
  2. A simple positive scaling that takes a vector in the input space to the output space ($\Sigma$)
  3. And another rotation in the output space ($U$)

Note that $V^\dagger$ denotes the conjugate of $V^\top$, hence the two are equal when $V$ is real.

Note that the conditions above are mathematically the following constraints:

  • $V^\dagger V = I$    (i.e. $V$ is a rotation matrix)
  • $\Sigma = \operatorname{diag}(\vec{\sigma})$ and $\vec{\sigma} \geq \vec{0}$    ("$\operatorname{diag}$" just returns a diagonal matrix with the given diagonal)
  • $U^\dagger U = I$    (i.e. $U$ is a rotation matrix)

The fundamental theorem of linear algebra says that such a decomposition always exists.

What is it used for?

Wikipedia has a nice list, but I'll list a couple. One of the most common applications is obtaining a low-rank approximation to a matrix (see PCA), which is used for compression, speed-up, and also actual data analysis. (e.g. If my memory serves right, I seem to recall reading that the Meyers-Briggs personality test was a result of running PCA on a lot of personality trait data and seeing a few "principal axes" that describe personalities.) The other one is for characterizing the pseudo-inverse for analysis or proofs, since inverting it automatically gives a formula that's the inverse when the matrix is invertible, and the pseudo-inverse when it is not.


###Eigenvalue (spectral) decomposition

Similarly, for the eigendecomposition (also known as eigenvalue decomposition, spectral decomposition, or diagonalization), I would say the following:

An eigendecomposition describes the effect of a matrix $A$ on a vector as a different 3-step process $A = Q \Lambda Q^{-1}$:

  1. An invertible linear transformation ($Q^{-1}$)
  2. A scaling ($\Lambda$)
  3. The inverse of the initial transformation ($Q$)

Correspondingly, these conditions imply the following constraints:

  • $Q$ is invertible
  • $\Lambda = \operatorname{diag}(\vec{\lambda})$

This decomposition doesn't always exist, but the spectral theorem describes the conditions under which such a decomposition exists.

Note the most basic requirement is that $A$ be a square matrix (but this is not enough).

What is it used for?

Well, one thing that makes it useful is that it gives you the ability to efficiently raise a matrix to a large power: $A^n = Q \Lambda^n Q^{-1}$. For this reason (and others) it's used heavily in engineering to, say, efficiently analyze and predict the behavior of a linear dynamical system at a future point in time, since this is much faster than manually exponentiating the matrix directly. It's also used to analyze the response of a linear system at different frequencies. (Sinusoids of different frequencies are orthogonal, so you get the orthogonal diagonalizability for free.) Furthermore, it's also a problem that repeatedly comes up when solving differential equations analytically.


At this point, you probably wonder:

##So when is an eigendecomposition different from an SVD?

Assuming both exist, notice that:

  • The eigenvalues (in $\Lambda$) may be negative

  • The eigenvectors (in $Q$) may be non-orthogonal

  • We usually assume $Q$ is a normal matrix since $Q^{-1}$ can cancel out the scaling, but if we don't do that, then that can also cause a mismatch.

So that means, in order for the SVD of $A$ to be equal to its eigendecomposition, we need $A$ to:

  • Have orthonormal eigenvectors (meaning all it does is to scale its input along $n$ orthogonal directions, where $A$ is $n\times n$)

  • Have positive eigenvalues (colloquially, it must be a real matrix and not "flip" anything)

It turns out that the above conditions are equivalent to the following:

  • $A$ must be a symmetric real matrix (this is equivalent to $A$ having real eigenvalues and orthonormal eigenvectors)

  • $A$'s (real) eigenvalues must be positive (copied from above; I can't think of a "nicer" equivalent)

So, we say:

The two are equal when when $A$ is symmetric positive-semidefinite.

This can be denoted as $A \succeq 0 \ \land \ A^\dagger = A$.

Bonus: This is exactly the set of matrices $A \in \mathbb{R}^{n\times n}$ that satisfy $A = B^\dagger B$ for some $B$. (Prove it!)

$\endgroup$
49
$\begingroup$

Intuitively, $SVD$ says for any linear map, there is an orthonormal frame in the domain such that it is first mapped to a different orthonormal frame in the image space, and then the values are scaled.

Eigendecomposition says that there is a basis, it doesn't have to be orthonormal, such that when the matrix is applied, this basis is simply scaled. That is assuming you have $n$ linearly independent eigenvectors of course. In some cases your eigenspaces may have the linear map behave more like upper triangular matrices.

Edit: Consider the difference for a rotation matrix in $\mathbb{R}^2$.

Here, there are no real eigenvalues and this corresponds to there being no choice of basis which under the transformation is simply a scaling. On the other hand, SVD makes a lot of sense here because it says we can take the standard basis in the domain, map it to the rotated version of this basis (thought of as in the image space), and scale everything by 1.

$\endgroup$
4
  • 1
    $\begingroup$ +1 Hmm, SVD preserves normality too? I thought it just preserves orthogonality, like when transforming this image into this image from the article I linked to. $\endgroup$
    – user541686
    Commented Mar 4, 2013 at 7:07
  • 1
    $\begingroup$ Also, regarding the rotation matrix example -- I think there is a basis in which the transformation is just a scaling, but the scaling is by a complex factor, right? $\endgroup$
    – user541686
    Commented Mar 4, 2013 at 7:11
  • 2
    $\begingroup$ It preserves normality until the point where you introduce the scaling. The basic idea is you take one orthonormal frame, map those to a different orthonormal frame, and then scale that second orthonormal frame (losing the normality). Extend everything else by linearity. As for the rotation matrix, yes there are complex eigenvalues corresponding to a complex eigenvector but I'm restricting the geometric interpretation to just vectors in $\mathbb{R}^2$ where these don't exist. $\endgroup$
    – muzzlator
    Commented Mar 4, 2013 at 8:33
  • $\begingroup$ Let me put $SVD$ another way. Suppose for now $M$ is a square matrix. It says there exists two orthonormal bases $\{b_1, ..., b_n\}$ and $\{c_1, ..., c_n\}$ such that $M b_i = k_i c_i$. When $M$ is not square, it does something pretty similar, just forget about the vectors which are sent to $0$ or which aren't in the image. $\endgroup$
    – muzzlator
    Commented Mar 4, 2013 at 8:43
7
$\begingroup$

The SVD arises from asking in which directions a linear transformation has the greatest impact. The eigendecomposition arises from asking in which directions a quadratic form has the greatest impact.

Here is some intuition, which can be made rigorous. Let $A$ be a real $m \times n$ matrix. For which unit vector $v$ is $\| A v \|_2$ the largest? Let $$ v_1 \in \arg \max_{\|v\|_2 = 1} \, \| A v \|_2 $$ and let $$ v_2 \in \arg \max_{\| v \|_2 = 1, v \perp v_1 } \, \| A v \|_2. $$ Let $u_1 = A v_1$ and $u_2 = A v_2$. Claim: $u_2$ is orthogonal to $u_1$. Reason: If $u_2$ were not orthogonal to $u_1$, then $v_1$ would not be a maximizer because it would be possible to improve $v_1$ by perturbing it a bit in the direction $v_2$. Perturbing $v_1$ in the orthogonal direction $v_2$ does not change the norm of $v_1$, but perturbing $u_1 = A v_1$ in the non-orthogonal direction $A v_2$ does change the norm of $u_1$. (When you walk on the surface of the earth, you are moving orthogonally to the radius of the earth and your distance from the center of the earth does not change.)

Continuing as above, we discover the SVD of $A$.

Next, let $A$ be a real symmetric $n \times n$ matrix. For which unit vector $v$ is $v^T A v$ the largest? Let $$ v_1 \in \arg \max_{ \| v \|_2 = 1} \, v^T A v. $$ Claim: $A v_1$ is a scalar multiple of $v_1$. Reason: If $A v_1 = \alpha v_1 + u$, with $u \perp v_1$ and $u \neq 0$, then $v_1$ would not be a maximizer because $v_1$ could be improved by perturbing it a bit in the direction $u$. Perturbing $v_1$ in the orthogonal direction $u$ does not change the norm of $v_1$, but \begin{align} (v_1 + \epsilon u)^T A(v_1 + \epsilon u) &\approx v_1^T A v_1 + 2 \epsilon u^T A v_1. \end{align} The term $2 \epsilon u^T A v_1 \neq 0$ is a non-negligible increase in the value of $v_1^T A v_1$.

Continuing like this, we discover the eigendecomposition of $A$.


I think we can also discover the SVD by asking in which direction-pairs the bilinear form $(u,v) \mapsto u^T Av$ has the greatest impact. Let $$ (u_1,v_1) \in \arg \max_{\|u\|_2 = 1, \|v\|_2 = 1} \, u^T A v. $$ Claim: $A v_1$ is a scalar multiple of $u_1$, and $A^T u_1$ is a scalar multiple of $v_1$. Reason: Assume that either $A v_1 \neq 0$ or $A^T u_1 \neq 0$ (otherwise the claim is trivial). Suppose that $A v_1 = \alpha u_1 + u$, with $u \perp u_1$ and $u \neq 0$, and also that $A^T u_1 = \beta v_1 + v$, with $v \perp v_1$ and $v \neq 0$. Then $(u_1,v_1)$ is not a maximizer because $(u_1,v_1)$ can be improved by perturbing it a bit in the direction $(u,v)$: \begin{align} (u_1 + \delta \,u)^T A (v_1 + \epsilon v) &\approx u_1^T A v_1 + \epsilon v^T A^T u_1 + \delta \, u^T A v_1. \end{align} The term $\epsilon v^T A^T u_1 + \delta \, u^T A v_1$ is a non-negligible increase in the value of $u_1^T A v_1$ (provided that $\epsilon$ and $\delta$ are chosen to have the correct signs).

Continuing like this, we discover the SVD of $A$. This approach is interesting because it is similar to the quadratic form approach given above. In this viewpoint, the SVD is for bilinear forms and the eigendecomposition is for quadratic forms. In both cases we want to know which inputs produce the largest outputs.

$\endgroup$
1
2
$\begingroup$

I see one problem with the question. Linear operator $A$ is is decomposable into $P^{-1}DP$, where $D$ is a diagonal matrix if and only if $A$ is full rank. Otherwise $D$ has to be non-diagonal matrix in Jordan normal form.

The 'advantage' of SVD is that it always 'diagonalizes' a matrix even if it is non-square.

$\endgroup$
1
$\begingroup$

This answer tries to give more connections between these two decompositions than their differences.

SVD actually stems from the eigenvalue decomposition of real symmetric matrices. If a matrix $A \in \mathbb{R}^{n \times n}$ is symmetric, then there exists an real orthogonal matrix $O$ such that $$A = O\text{diag}(\lambda_1, \ldots, \lambda_n)O', \tag{1}$$ where $\lambda_1, \ldots, \lambda_n$ are all real eigenvalues of $A$. In other words, $A$ is orthogonal similar to a diagonal matrix $\text{diag}(\lambda_1, \ldots, \lambda_n)$.

For a general (rectangular) real matrix $B \in \mathbb{R}^{m \times n}$, clearly $B'B$ is square, symmetric and semi-positive definite, thus all its eigenvalues are real and non-negative. By definition, the singular values of $B$ are all arithmetic square root of the positive eigenvalues of $B'B$, say, $\mu_1, \ldots, \mu_r$. Since $B'B$ has its eigen-decomposition $$B'B = O\text{diag}(\mu_1^2, \ldots, \mu_r^2, 0, \ldots, 0)O',$$ it can be shown (doing a little clever algebra) that there exist orthogonal matrices $O_1 \in \mathbb{R}^{m \times m}$ and $O_2 \in \mathbb{R}^{n \times n}$ such that $B$ has the following Singular Value Decomposition (SVD): $$B = O_1 \text{diag}(\text{diag}(\mu_1, \ldots, \mu_r), 0)O_2, \tag{2}$$ where $0$ in the diagonal matrix is a zero matrix of size $(m - r) \times (n - r)$. $(2)$ sometimes is said as $B$ is orthogonal equivalent to the diagonal matrix $\text{diag}(\text{diag}(\mu_1, \ldots, \mu_r), 0)$.

In view of $(1)$ and $(2)$, both eigen-decomposition (in its narrow sense for symmetric matrices only) and SVD are trying to look for representative elements under some relations.

In detail, the eigen-decomposition $(1)$ states that under the orthogonal similar relation, all symmetric matrices can be classified into different equivalent classes, and for each equivalent class, the representative element can be chosen to be the simple diagonal matrix $\text{diag}(\lambda_1, \ldots, \lambda_n)$. It can be further shown that the set of eigenvalues $\{\lambda_1, \ldots, \lambda_n\}$ is the maximal invariant under the orthogonal similar relation.

By comparison, the SVD $(2)$ states that under the orthogonal equivalent relation, all $m \times n$ matrices can be classified into different equivalent classes, and for each equivalent class, the representative element can also be chosen to be a diagonal matrix $\text{diag}(\text{diag}(\mu_1, \ldots, \mu_r), 0)$. It can be further shown that the set of singular values $\{\mu_1, \ldots, \mu_r\}$ is the maximal invariant under the orthogonal equivalent relation.

In summary, given a matrix $M$ to be decomposed, both eigen-decomposition and SVD aim to seek for its simplified profile. This is not much different from seeking a representative basis under which a linear transformation has its simplistic coordinate expression. Moreover, the above (incomplete) arguments showed that eigen-decomposition and SVD are closely related -- in fact, one way to derive SVD is completely from the eigen-decomposition.

$\endgroup$
0
$\begingroup$

[This is just to mention how SVD of ${ A \in \mathbb{C} ^{m \times n} }$ comes naturally from diagonalising ${ A ^{*} A \in \mathbb{C} ^{n \times n} }$]

Let ${ A \in \mathbb{C} ^{m \times n} }$ be of rank ${ r }.$ We are interested in a diagonalisation-like decomposition of ${ A }.$

Matrix ${ A }$ is not square, but ${ A ^{*} A }$ is self-adjoint so it can be diagonalised (via an orthonormal basis of eigenvectors).
Matrix ${ A ^{*} A }$ is self-adjoint so its diagonalised version is self-adjoint (i.e. has real diagonal entries). Matrix ${ A ^{*} A }$ is positive semidefinite so its diagonalised version is positive semidefinite (i.e. has non-negative diagonal entries).
Further, ${ \mathcal{N}(A ^{*} A) = \mathcal{N}(A) }$, so the diagonalised version of ${ A ^{*} A }$ has exactly ${ r }$ non-zero diagonal entries.

So there is an orthonormal basis ${ \mathscr{V} = [V _1, \ldots, V _n] }$ of eigenvectors of ${ A ^{*} A },$ such that $${ A ^{*} A \mathscr{V} = \mathscr{V} \begin{pmatrix} \Sigma _{r} ^{2} &0 \\ 0 &0 \end{pmatrix} }$$ and ${ \Sigma _{r} = \text{diag}(\sigma _1 , \ldots, \sigma _r) }$ and ${ \sigma _1 \geq \ldots \geq \sigma _r > 0 }.$

This suggests splitting ${ \mathscr{V} = [V _1, \ldots, V _n] }$ into ${ \mathscr{V} _r = [V _1, \ldots, V _{r}] }$ and ${ \mathscr{V} _{n-r} = [V _{r+1}, \ldots, V _n] },$ giving two separate equations $${ \begin{cases} A ^{*} A \mathscr{V} _r = \mathscr{V} _{r} \Sigma _{r} ^{2}, \\ A ^{*} A \mathscr{V} _{n-r} = 0 .\end{cases} }$$

Since ${ \mathcal{N}(A ^{*} A) = \mathcal{N}(A) },$ the second equation can be rewritten as ${ A \mathscr{V} _{n-r} = 0 }.$
The first equation implies ${ \mathscr{V} _{r} ^{*} A ^{*} A \mathscr{V} _r = \Sigma _r ^2 }$ that is ${ (A \mathscr{V} _{r} \Sigma _{r} ^{-1} ) ^{*} (A \mathscr{V} _{r} \Sigma _{r} ^{-1}) = I _{r} .}$

Hence we have $${ \begin{cases} \mathscr{U} _{r} = A \mathscr{V} _{r} \Sigma _{r} ^{-1} \text{ are } r \text{ orthonormal vectors}, \\ A \mathscr{V} _{n-r} = 0. \end{cases} }$$ Extend ${ \mathscr{U} _{r} \in \mathbb{C} ^{m \times r} }$ to an orthonormal basis ${ \mathscr{U} = [\mathscr{U} _{r}, \mathscr{U} _{m-r}] }.$

We now have $${ A [\mathscr{V} _{r}, \mathscr{V} _{n-r}] = [\mathscr{U} _{r} , \mathscr{U} _{m-r}] \begin{pmatrix} \Sigma _{r} &0 \\ 0 &0 \end{pmatrix} , }$$ called singular value decomposition of ${ A }.$

Here notably ${ \mathscr{V} = [\mathscr{V} _{r}, \mathscr{V} _{n-r}] }$ and ${ \mathscr{U} = [\mathscr{U} _{r}, \mathscr{U} _{m-r}] }$ are orthonormal bases, and ${ \Sigma _{r} = \text{diag}(\sigma _1, \ldots, \sigma _r) }$ with ${ \sigma _1 \geq \ldots \geq \sigma _r > 0 }.$

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .