3
$\begingroup$

Theorem: For every $\varepsilon >0$, with the probability greater than $1-\varepsilon$ \begin{align*} R_p(\hat{g}_{n,\mathcal{G}}) - R_{p}(g^*_{p,\mathcal{G}}) \le 2 \sqrt{\dfrac{2V_{\mathcal{G}}(2n) \ln(4(2n+1)\varepsilon^{-1}) }{n}} \end{align*} where $V_{\mathcal{G}}(N) = \sup_{\mathbb{X_N} \in \mathcal{X}^N} V_{\mathcal{G}} (\mathbb{X_N})$ with $\mathbb{X_N} = \left\{x_1,x_2,\ldots x_N\right\}$. The notation $V_{\mathcal{G}}$ denotes for the VC-dimension of $\mathcal{G}$.

My proof: (This proof was confirmed to be wrong by my lecturer. He pointed out the mistake was at the union bound after (1.1). I wonder if you could help me fix this point by using Rademacher complexity)

Before going to the proof, let us recall some definitions, and lemmas we are going to use.

Lemma (Sauer). For all $S = \left\{x_1,\ldots,x_N\right\} = \mathbb{X_N}$, we have \begin{align*} \vert T_{\mathcal{G}}(S)\vert &\le \sum_{k=0}^{V_{\mathcal{G}}(\mathbb{X_N})} C^k_N\\ & \le \begin{cases} (N+1)^{V_{\mathcal{G}}(\mathbb{X_N})} & \text{, if } N> V_{\mathcal{G}}(\mathbb{X_N})\\ 2^N & \text{, if } N \le V_{\mathcal{G}}(\mathbb{X}_N) \end{cases} \end{align*}

Lemma: (Symmetrization). Let $l$ be the loss function that \begin{align*} a \le l(y,y') \le a+1,\ \forall y,y' \in \mathcal{Y}. \end{align*} Let $\mathcal{D'}_n = (x'_i,y'_i),\ i=\overline{1,n}$ be a sample independent of $\mathcal{D}_n$ with the same law. Then for all $t \ge \sqrt{\dfrac{2}{n}}$ \begin{align*} \mathbb{P}\left(\sup_{g \in \mathcal{G}}\vert \hat{R}_n(g) - R_p(g)\vert > t\right) \le 2\mathbb{P}\left(\sup_{g \in \mathcal{G}}\left\{\hat{R}_n(g) - \hat{R}'_n(g)\right\} > \dfrac{t}{2}\right), \end{align*} where $\hat{R}_n(g):=\dfrac{1}{n}\sum_{i=1}^n l(y_i,g(x_i))$, $\hat{R}'_n(g):=\dfrac{1}{n}\sum_{i=1}^n l(y'_i,g(x'_i))$.

Back to the main proof. Let $nt^2 = 2V_{\mathcal{G}}(2n)\ln\left(4(2n+1)\varepsilon^{-1}\right)$. Then, we would like to prove that \begin{align*} \mathbb{P}\left(R_p(\hat{g}_{n,\mathcal{G}}) - R_p(g^*_{p,\mathcal{G}}) > 2t \right) \le \varepsilon. \tag{*} \end{align*}

Firstly, we use the following fact \begin{align*} R_p(\hat{g}_{n,\mathcal{G}}) - R_p(g^*_{p,\mathcal{G}}) \le 2\sup_{g \in \mathcal{G}}\vert \hat{R}_n(g) - R_p(g)\vert, \end{align*} and obtain that \begin{align*} \mathbb{P}\left(R_p(\hat{g}_{n,\mathcal{G}}) - R_p(g^*_{p,\mathcal{G}}) > 2t \right) \le \mathbb{P}\left(\sup_{g \in \mathcal{G}}\vert \hat{R}_n(g) - R_p(g)\vert > t \right). \end{align*} Next, by using the symmetrization lemma, we have \begin{align*} \mathbb{P}\left(\sup_{g \in \mathcal{G}}\vert \hat{R}_n(g) - R_p(g)\vert > t \right) \le 2 \mathbb{P}\left(\sup_{g \in \mathcal{G}}\left\vert \hat{R}_n(g) - \hat{R}'_n(g)\right\vert > \dfrac{t}{2} \right). \tag{1.1} \end{align*}

Let us consider the trace of $\mathcal{G}$ on $(\mathbb{X}_n,\mathbb{X}'_n)$, denotes $T_{\mathcal{G}}(\mathbb{X}_n,\mathbb{X}'_n)$.

Each element of $T_{\mathcal{G}}(\mathbb{X}_n,\mathbb{X}'_n)$ is a sample $\left\{x_{j_1},x_{j_2},\ldots,x_{j_K}\right\}$ with $K \le 2n$. To each of these elements, we can associate a prediction $g \in \mathcal{G}$ such that $g(x_{j_k})$ = 1, $\forall k \in [K]$. We just built a bijection between $T_{\mathcal{G}}(\mathbb{X}_n,\mathbb{X}'_n)$ and a subset of $\mathcal{G}$, denotes $\hat{\mathcal{G}}_n$. With this notation, we have

$$\vert \hat{\mathcal{G}}_n\vert = T_{\mathcal{G}}(\mathbb{X}_n,\mathbb{X}'_n) \le (2n+1)^{V_{\mathcal{G}}(2n) }.$$

Therefore, we can derive from (1.1) that

\begin{align*} \mathbb{P}\left(\sup_{g \in \mathcal{G}}\vert \hat{R}_n(g) - R_p(g)\vert > t \right) &\le 2 \mathbb{P}\left(\sup_{g \in \mathcal{G}}\left\vert \hat{R}_n(g) - \hat{R}'_n(g)\right\vert > \dfrac{t}{2} \right)\\ &=2\mathbb{P}\left(\sup_{g \in \hat{\mathcal{G}}_n}\left\{\hat{R}_n(g) - \hat{R}'_n(g)\right\}> t/2\right)\\ &\le 2 \sum_{g \in \hat{\mathcal{G}}_n} \mathbb{P}\left(\hat{R}_n(g) - \hat{R}'_n(g)> t/2 \right) \tag{union bound}\\ &\le 2\vert \hat{\mathcal{G}_n}\vert \max_{g \in \mathcal{G}} \mathbb{P}\left(\hat{R}_n(g) - \hat{R}'_n(g) > t/2 \right)\\ & \le 2(2n+1)^{V_\mathcal{G}(2n)} \max_{g \in \mathcal{G}} \mathbb{P}\left(\hat{R}_n(g) - \hat{R}'_n(g) > t/2 \right) \end{align*} By substituting $\hat{R}_n(g) = \dfrac{1}{n}\mathbb{1}_{g(X_i)\neq Y_i}$ and using the Hoeffding inequality, we obtain the result in theorem.

$\endgroup$
2
  • 2
    $\begingroup$ Your click-bait title worked. I clicked to see how that VP theorem proved itself. $\endgroup$
    – user14111
    Commented Feb 25 at 23:17
  • $\begingroup$ Can you show how you use Hoeffding inequality in the last step $\endgroup$
    – Pipnap
    Commented Feb 26 at 17:38

0

You must log in to answer this question.