6
$\begingroup$

Suppose I have $m$ samples drawn from a Gaussian in $\mathbb{R}^n$, and need sample covariance $\Sigma_m$ to be $\epsilon$-close to true covariance $\Sigma$:

$$E\|\Sigma_m-\Sigma\| \le \epsilon \|\Sigma\|$$

How many samples do I need?

My distribution is nearly singular, in sense that intrinsic dimension $r$ is much smaller than embedding dimension $n$ where

$$r=\frac{\text{tr}(\Sigma)}{\|\Sigma\|}$$

The term "intrinsic dimensions" comes from Tropp's book, Chapter 7.

I found the following sample-size requirement in Vershynin, High-Dimensional Probability Remark 5.6.3, for an arbitrary distribution: $$m \approx \epsilon^{-2} r \log n$$

Can this be tightened for a Gaussian distribution? In particular, I'm wondering if the $\log n$ factor can be dropped.

Here's what error looks like for various dimensions with intrinsic dimension fixed. enter image description here

notebook

$\endgroup$
11
  • 1
    $\begingroup$ Are you estimating the covariance by the sample covariance, or a shrinkage-based estimator, or … ? $\endgroup$ Commented Jun 10, 2021 at 22:33
  • $\begingroup$ There's an issue of $m$ vs $m-1$ in denominator, but I think it won't affect the bound, so the standard definition of sample covariance works -- en.wikipedia.org/wiki/… $\endgroup$ Commented Jun 10, 2021 at 22:41
  • $\begingroup$ I ask because it’s pretty well established that, in higher dimensions, the sample covariance tends to be a poor estimator of the actual covariance. Nothing to do with the choice of $m$ vs $m - 1$. $\endgroup$ Commented Jun 10, 2021 at 22:45
  • $\begingroup$ The question is specifically about sample covariance $\endgroup$ Commented Jun 10, 2021 at 22:47
  • 1
    $\begingroup$ @RylanSchaeffer looks like I didn't read Vershynin carefully enough, it's Theorem 9.2.4 $\endgroup$ Commented Aug 4, 2022 at 3:55

0

You must log in to answer this question.

Browse other questions tagged .