1
$\begingroup$

Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in \mathbb R^n$ and $i \in \{1,2,\ldots,n\}$, define $$ z_i(w) := 1[|y_ix_i^\top w - \epsilon| \ge t] = \begin{cases} 1,&\mbox{ if }|y_i x_i^\top w - \epsilon| \ge t,\\ 0,&\mbox{ else.} \end{cases} $$ Finally, define $\Delta_{n,\epsilon,t}(P) := \sup_{\|w\|=1} |\frac{1}{n}\sum_{i=1}^n z_i(w) - z(w)|$, where $z(w) = \mathbb P(|y_1x_1^\top w - \epsilon| \ge t)$ is the expected value of $z_i(w)$.

Using VC theory, one can show that $\Delta_{n,\epsilon,t}(P) \le C\sqrt{d/n}$ (ignoring log-factors) w.p $1-o(1)$ in the limit $n \to \infty$. This is because the function class $\{(x,y) \mapsto 1[|yx^\top w - 1| \ge t] \mid w \in \mathbb R^d\text{ with }\|w\|=1\}$ has VC-dimension $O(d)$.

Question. What kinds of assumptions on the distribution $P$ can allow the $\sqrt{d/n}$ rate to be improved to something like $Cd/n$ or even $C/n$ where the constant $C$ is now allowed to depend on $\epsilon$ and $t$ ? For example, what if $P$ is given by $x \mid y \sim N(\mu_y,\Sigma_y)$ (i.e the marginal distribution of $x$ is a Gaussian mixture), and $\mathbb P(y=1) = \pi \in (0,1)$, where $\mu_{\pm 1} \in \mathbb R^d$ and $\Sigma_{\pm 1}$ are positive-definite $d \times d$ matrices ?

$\endgroup$
2

2 Answers 2

4
$\begingroup$

It is not possible to get $|\frac1n\sum_i z_i(w) - E[z_i(w)]|=O_P(r_n)$ with $r_n \lll n^{-1/2}$ even for a single $w$, since it would contradict the CLT whenever Var$[z_i(w)]>0$.

$\endgroup$
1
  • $\begingroup$ ok indeed, that makes sense. thanks ! $\endgroup$
    – dohmatob
    Commented Oct 23, 2023 at 15:12
1
$\begingroup$

If you pick $P$ to be a Dirac distribution on any pair $(x,y) \in \mathbb{R}^{d+1}$, then you trivially have $\Delta_{n, \epsilon, t}(P)=0$. This is consistent with the previous answer because $\text{var}(z_i(w)) =0$ in this case. However, I assume that you are asking this question for a non-trivial distribution $P$. In this case, the bound of $\sqrt{d/n}$ probably cannot be improved.

On the other hand, if you are okay with the upperbound on excess risk of a particular estimator instead of the uniform convergence bound, then $d/n$ rate is possible for all realizable distributions.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.