0
$\begingroup$

This question is about chapter 8.7 Bagging from Element of Statistical Learning (ESL) textbook.

Assume our training observations $\left(x_i, y_i\right), i=1, \ldots, N$ are independently drawn from a distribution $\mathcal{P}$, and consider the ideal aggregate estimator $f_{\text {ag }}(x)=\mathrm{E}_{\mathcal{P}} \hat{f}^*(x)$. Here $x$ is fixed and the bootstrap dataset $\mathbf{Z}^*$ consists of observations $x_i^*, y_i^*, i=1,2, \ldots, N$ sampled from $\mathcal{P}$. (Note that $f_{\text {ag }}(x)$ is a bagging estimate, drawing bootstrap samples from the actual population $\mathcal{P}$ rather than the data. It is not an estimate that we can use in practice, but is convenient for analysis. ) The author wrote $$ \begin{aligned} \mathrm{E}_{\mathcal{P}}\left[Y-\hat{f}^*(x)\right]^2 & =\mathrm{E}_{\mathcal{P}}\left[Y-f_{\mathrm{ag}}(x)+f_{\mathrm{ag}}(x)-\hat{f}^*(x)\right]^2 \\ & =\mathrm{E}_{\mathcal{P}}\left[Y-f_{\mathrm{ag}}(x)\right]^2+\mathrm{E}_{\mathcal{P}}\left[\hat{f}^*(x)-f_{\mathrm{ag}}(x)\right]^2 \\ & \geq \mathrm{E}_{\mathcal{P}}\left[Y-f_{\mathrm{ag}}(x)\right]^2 \end{aligned} $$

But this relies on the assumption $$\mathrm{E}_{\mathcal{P}}[(Y-f_{\mathrm{ag}}(x))(f_{\mathrm{ag}}(x)-\hat{f}^*(x) )] = 0$$

The author later mentioned that the main caveat is "independent'' and the bagged trees are not. So it seems like the nice decomposition above is relied on the assumption that the bagged trees are independent.

However, from the expression $$\mathrm{E}_{\mathcal{P}}[(Y-f_{\mathrm{ag}}(x))(f_{\mathrm{ag}}(x)-\hat{f}^*(x) )] = 0,$$ I don't see it is the covariance between two bagged trees. (If it is, then based on independence assumption, clearly it is zero). To me it is more like the covariance between the $Y$ and $\hat{f}^*(x)$, if it is appropriate to think $f_{\mathrm{ag}}(x)) = \mathrm{E}_{\mathcal{P}}[Y | X=x]$.

$\endgroup$
0

0

You must log in to answer this question.

Browse other questions tagged .