2
$\begingroup$

Let $X$ be a finite set and let $\emptyset\neq \mathcal{H}\subseteq \{ 0,1 \}^{\mathcal{X}}$. Let $\{(X_n,L_n)\}_{n=1}^N$ be i.i.d. random variables on $X\times \{0,1\}$ with law $\mathbb{P}$. Without knowing more of $\mathcal{H}$, what is the best risk-bounds available via VC-dimension or Rademacher-complexity theory on the true risk $$ \sup_{h\in \mathcal{H}}\,\mathbb{E}_{(X,L)\sim \mathbb{P}}[I(h(x)=L)] - \frac1{N}\sum_{n=1}^N\, I(h(X_n)=L_n)? $$

$\endgroup$
2
  • $\begingroup$ The displayed expression is a random variable, say $Z_n$. What do you then mean by a best risk-bound on $Z_n$? $\endgroup$ Commented Feb 15, 2023 at 22:10
  • $\begingroup$ @IosifPinelis Ah, PAC-type bounds. Something like, for each $\delta \in (0,1]$ the above quantity is less that ... +(some term likely of the form $\log(1/\delta)$ ) holds with probability at-least $1-\delta$. $\endgroup$ Commented Feb 15, 2023 at 22:14

1 Answer 1

3
$\begingroup$

$\newcommand\HH{\mathcal H}\newcommand\ep\varepsilon$Let \begin{equation} D_N:=\sup_{h\in\HH}\Big(E\,I(h(X)=L) - \frac1{N}\sum_{n=1}^N\, I(h(X_n)=L_n)\Big). \end{equation}

It was shown in (the proof of) Long's main result, 1999 (which he ascribed to Talagrand) that \begin{equation} P(|D_N|>\ep)\le\exp\Big(d_\HH-cN\ep^2) \end{equation} for some real $c>0$, all $N$, and all real $\ep>0$, where $d_\HH$ is the VC dimension of $\HH$.

Similar results, but in terms of certain covering numbers for $\HH$ rather than the VC dimension, were obtained by Talagrand, 1994.

At least as recently as in October 2019, these results by Talagrand and Long still seemed to be the state of the art.

$\endgroup$
1
  • 1
    $\begingroup$ Wow this is extremely interesting! Thank you Iosif and doubly-so for the lower-bound paper, optimality was my next question! :) $\endgroup$ Commented Feb 16, 2023 at 21:48

Not the answer you're looking for? Browse other questions tagged or ask your own question.