Conditional Expectation: modern and classical definitions

Question

I've just taken a course in modern probability theory following William's Probability with Martingales. On a separate venture, I started reading Hastie's The Elements of Statistical Learning. As I never took a statistics course, I'm having a bit of trouble conciliating different definitions of conditional expectation.

On Hastie, it is claimed that the expected prediction error $$\text{EPE}(f) = E(Y-f(X))^2$$ can also be written as $$\text{EPE}(f) = E_X E_{Y | X}((Y-f(X))^2 | X). $$ Now, I'm not quite sure what the right hand side means. This seems to be an application of the tower property $$\text{E}(X | \mathcal{H}) = \text{E}(\text{E}(X | \mathcal{G}) | \mathcal{H}), $$ where $\mathcal{G}$ is a sub $\sigma$-algebra of $\mathcal{H}$, but I'm not quite able to translate everything back and forth.

To be concrete: what does $\text{E}_X$ and $\text{E}_{Y|X}$ mean, and how can I phrase the identity above for the EPE in terms of the modern definition of conditional expectation?

$\begingroup$ This section of the Wikipedia page may be helpful. $\endgroup$
– angryavian
Commented May 30, 2017 at 1:02 — angryavian, Commented May 30, 2017 at 1:02

Graham Kemp · Accepted Answer · 2017-05-30 01:27:43Z

$$\text{EPE}(f) = E(Y-f(X))^2$$ can also be written as $$\text{EPE}(f) = E_X E_{Y | X}((Y-f(X))^2 \mid X). $$ Now, I'm not quite sure what the right hand side means. This seems to be an application of the tower property

That it is. You may safely ignore the "training wheels" subscripts. They are meant as helpful reminders evoking a double integral's bounds, but, in practice, seem to confuse the issue more often than they help. You are dealing with plain:

$$\text{EPE}(f) ~{=~E\big((Y-f(X))^2\big) \\[1ex] =~ E \Big(E\big((Y-f(X))^2 \mid X\big)\Big) }$$

In short: $\mathsf E(g(X,Y)) = \mathsf E(\mathsf E(g(X,Y)\mid \sigma(X)))$, the expectation of a function of two(or more) random variables equals the expectation of the conditional expectation of the function measured over (the sigma-algebra of) one of them.

The text has expressed it directly as the double expectation, most likely because it they will be using a double integral (or sum) to evaluate it down the line.

As you said, I was just confused by the subscripts. This sorts it out perfectly. Thanks! — Felipe Jacob, Commented May 30, 2017 at 1:35

Stack Exchange Network

Conditional Expectation: modern and classical definitions

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability
probability-theory
.

Hot Network Questions

Conditional Expectation: modern and classical definitions

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probabilityprobability-theory.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
probability-theory
.