What is the rank of correlation matrix and its estimate?

Question

For a n-dimensional vector $\mathbf{x}$, a $n\times n$ correlation matrix $\mathbf{R}$ is https://en.wikipedia.org/wiki/Covariance_matrix#Correlation_matrix

\begin{equation} \mathbf{R} = {E}\big[(\mathbf{x}-E(\mathbf{x}))(\mathbf{x}-E(\mathbf{x}))^T\big]\tag{1a} \end{equation}

where $E(.)$ is expectation operator. If $E(\mathbf{x})=0$, the correlation $\mathbf{R}$ reduces to

\begin{equation} \mathbf{R} = {E}\big[\mathbf{x}^{}\mathbf{x}^T\big]\tag{1b} \end{equation}

The estimate of $\mathbf{R}$, call it $\mathbf{R_{xx}}$, can be computed by collecting $N$ independent n-dimensional sample vectors $\mathbf{x}$ (http://perso-math.univ-mlv.fr/users/banach/workshop2010/talks/Vershynin.pdf)

\begin{equation} \mathbf{R_{xx}} = \frac{1}{(N-1)}\sum_{i=1}^{N} \mathbf{x}_i\mathbf{x}_i^T \tag{2} \end{equation}

My question are

what is the $rank(\mathbf{R})$
what is the $rank(\mathbf{R_{xx}})$ when $N>>n$

From (1b), $rank(\mathbf{R})$ should be 1. For (2), I searched for "rank of sum of rank-1 matrices" and found this post Rank of sum of rank-1 matrices which essentially says that rank of sum of rank-1 matrices as be as high as n for independent vectors. These are two conflicting things and I am not able to understand what I am missing here.

Guangliang · Accepted Answer · 2017-01-11 20:02:29Z

2

$rank(\mathbf{R})$ equals to the number of independent random variables in $\mathbf{x}$. If $\mathbf{R}$ is full rank ($rank(\mathbf{R}) = n$), then it means that all components of $\mathbf{x}$ are linearly independent. If $rank(\mathbf{R}) = k \lt n$, that means there are only $k$ independent random variables in $\mathbf{x}$, the other $n-k$ random variables can be constructed by a linear combination of other components of $\mathbf{x}$.

Your equation (1b) doesn't lead to $rank(\mathbf{R}) = 1$. With certainly conditions (for example, $\mathbf{x}_i$ i.i.d normal), your equation (2) should approach $\mathbf{R}$, and $rank(\mathbf{R_{xx}})$ approaches $rank(\mathbf{R})$.

answered Jan 11, 2017 at 20:02

Guangliang

9647 silver badges9 bronze badges

$\begingroup$ I consider $\mathbf{x}$ as a column vector $\mathbf{x}=[x_1,x_2,\cdots,x_n]^T$. When you say "number of independent random variables in $\mathbf{x}$" you imply $x_i$ being independent? $\endgroup$
– NAASI
Commented Jan 11, 2017 at 20:39
$\begingroup$ When you wrote down equation (1), it means that $\mathbf{x}$ is a column vector of random variables. Each component $\mathbf{x}_i$ of $\mathbf{x}$ is a single random variable. $\mathbf{R}$ is the correlation matrix of these random variables. If $mathbf{R}$ is not full rank, that means there are linear dependency(ies) among $\mathbf{x}_i$'s. $\endgroup$
– Guangliang
Commented Jan 11, 2017 at 20:51

Add a comment |

Florian · Accepted Answer · 2017-01-11 21:49:11Z

To answer your question, you need to make assumptions on the statistics of $\mathbf{x}$. So far, you have not said anything about them. In general, unless different random variables $x_n$ in your vector $\mathbf x$ are linearly dependent, $$\mathbf{R} = {\mathbb{E}}\{\mathbf{x}\mathbf{x}^{\rm T}\}$$ will turn out to have full rank. That's because the expectation performs an ensemble average so it's like averaging all possible realizations of $\mathbf x \mathbf x^{\rm T}$. Unless they are somehow dependent, averaging many rank one matrices provides a full rank matrix.

If your stochastic process is additionally stationary and ergodic, you can replace the ensemble average by an average over time, using subsequent realizations $\mathbf x_i$ via $\sum \mathbf x_i \mathbf x_i^{\rm T}$. Under correct statistical assumptions you then have $$\lim_{N \rightarrow \infty} \frac{1}{N} \sum_{i=1}^N \mathbf x_i \mathbf x_i^{\rm T} = \mathbf R.$$ You can then expect that for $N \gg n$, your sample estimate is full rank. Though this is hard to prove rigorously, as you could always be unlucky with the $N$ samples you drew. But this is very unlikely. In fact, it is possible though to bound the probability that your sample covariance matrix is rank deficient and it decreases exponentially with $N$ as soon as $N\geq n$.

Stack Exchange Network

What is the rank of correlation matrix and its estimate?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linear-algebra
matrices
statistics
correlation
.

Linked

Hot Network Questions

What is the rank of correlation matrix and its estimate?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linear-algebramatricesstatisticscorrelation.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linear-algebra
matrices
statistics
correlation
.