162
$\begingroup$

If I have a covariance matrix for a data set and I multiply it times one of it's eigenvectors. Let's say the eigenvector with the highest eigenvalue. The result is the eigenvector or a scaled version of the eigenvector.

What does this really tell me? Why is this the principal component? What property makes it a principal component? Geometrically, I understand that the principal component (eigenvector) will be sloped at the general slope of the data (loosely speaking). Again, can someone help understand why this happens?

$\endgroup$

3 Answers 3

166
$\begingroup$

Short answer: The eigenvector with the largest eigenvalue is the direction along which the data set has the maximum variance. Meditate upon this.

Long answer: Let's say you want to reduce the dimensionality of your data set, say down to just one dimension. In general, this means picking a unit vector $u$, and replacing each data point, $x_i$, with its projection along this vector, $u^T x_i$. Of course, you should choose $u$ so that you retain as much of the variation of the data points as possible: if your data points lay along a line and you picked $u$ orthogonal to that line, all the data points would project onto the same value, and you would lose almost all the information in the data set! So you would like to maximize the variance of the new data values $u^T x_i$. It's not hard to show that if the covariance matrix of the original data points $x_i$ was $\Sigma$, the variance of the new data points is just $u^T \Sigma u$. As $\Sigma$ is symmetric, the unit vector $u$ which maximizes $u^T \Sigma u$ is nothing but the eigenvector with the largest eigenvalue.

If you want to retain more than one dimension of your data set, in principle what you can do is first find the largest principal component, call it $u_1$, then subtract that out from all the data points to get a "flattened" data set that has no variance along $u_1$. Find the principal component of this flattened data set, call it $u_2$. If you stopped here, $u_1$ and $u_2$ would be a basis of the two-dimensional subspace which retains the most variance of the original data; or, you can repeat the process and get as many dimensions as you want. As it turns out, all the vectors $u_1, u_2, \ldots$ you get from this process are just the eigenvectors of $\Sigma$ in decreasing order of eigenvalue. That's why these are the principal components of the data set.

$\endgroup$
12
  • 2
    $\begingroup$ Great explanation, but aren't $u$ and $x$ both column vectors? If this is the case, shouldn't $u^Tx$ be a number, i.e., a 1x1 vector? Why do you call the inner product a vector? I feel the projected vector should be $u^Txu$ instead. Please kindly correct me should I be wrong. Thanks a lot! $\endgroup$ Commented Aug 30, 2014 at 15:21
  • 2
    $\begingroup$ @Farticle Pilter: Yes, $u^Tx$ is a scalar (note that I don't call it a vector). You replace each point $x$ with the number $u^Tx$, turning an $n$-dimensional data set into a $1$-dimensional data set. $\endgroup$
    – user856
    Commented Aug 30, 2014 at 18:27
  • 3
    $\begingroup$ 'what you can do is first find the largest principal component, call it $u_1$, then subtract that out from all the data points to get a "flattened" data set that has no variance along $u_1$' - $u_1$ is a vector, right? Subtracting that vector from data points won't project them onto a plane (and we want to project them onto a plane, that will eliminate the variance of the data in $u_1$ direction). What am I missing here? Please help. $\endgroup$
    – user216094
    Commented Mar 20, 2015 at 21:10
  • 7
    $\begingroup$ Why is the eigenvector of the covariance matrix the best fit line? Is it something about the matrix? Is it a property of the eigensystem of a symmetric matrix?? $\endgroup$
    – azizj
    Commented Aug 28, 2017 at 2:59
  • 4
    $\begingroup$ If you have few doubts: math.stackexchange.com/questions/174015/…` $\endgroup$
    – Kaushal28
    Commented May 2, 2019 at 19:58
31
$\begingroup$

Some informal explanation:

Covariance matrix $C_y$ (it is symmetric) encodes the correlations between variables of a vector. In general a covariance matrix is non-diagonal (i.e. have non zero correlations with respect to different variables).

But it's interesting to ask, is it possible to diagonalize the covariance matrix by changing basis of the vector?. In this case there will be no (i.e. zero) correlations between different variables of the vector.

Diagonalization of this symmetric matrix is possible with eigen value decomposition. You may read A Tutorial on Principal Component Analysis (pages 6-7), by Jonathon Shlens, to get a good understanding.

$\endgroup$
1
  • 1
    $\begingroup$ "Diagonalizing the covariance matrix" part is insightful! Thank you! $\endgroup$ Commented Aug 30, 2014 at 15:38
13
$\begingroup$

If we would project our data $D$ onto any vector $\vec{v}$, this data would be obtained as $\vec{v}^{\intercal} D$, and its covariance matrix then becomes $\vec{v}^{\intercal} \Sigma \vec{v}$.

Since the largest eigenvector is the vector that points into the direction of the largest spread of the original data, the vector $\vec{v}$ that points into this direction can be found by choosing the components of the resulting covariance matrix such that the covariance matrix $\vec{v}^{\intercal} \Sigma \vec{v}$ of the projected data is as large as possible.

Maximizing any function of the form $\vec{v}^{\intercal} \Sigma \vec{v}$ with respect to $\vec{v}$, where $\vec{v}$ is a normalized unit vector, can be formulated as a so called Rayleigh Quotient. The maximum of such a Rayleigh Quotient is obtained by setting $\vec{v}$ equal to the largest eigenvector of matrix $\Sigma$.

In other words; the largest eigenvector of $\Sigma$ corresponds to the principal component of the data.

If the covariances are zero, then the eigenvalues are equal to the variances: enter image description here

If the covariance matrix not diagonal, the eigenvalues represent the variance along the principal components, whereas the covariance matrix still operates along the axes: enter image description here

An in-depth discussion (and the source of the above images) of how the covariance matrix can be interpreted from a geometrical point of view can be found here: http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/

$\endgroup$
2
  • $\begingroup$ I found your answer the most non-circular, as you pointed me into the direction of en.wikipedia.org/wiki/Rayleigh_quotient as a concept required to grasp the reason behind all this. Thank you. $\endgroup$
    – qbolec
    Commented Mar 22, 2016 at 16:19
  • 4
    $\begingroup$ Why is the assumption "Since the largest eigenvector is the vector that points into the direction of the largest spread of the original data" true?? $\endgroup$
    – azizj
    Commented Aug 28, 2017 at 3:01

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .