I've just taken a course in modern probability theory following William's Probability with Martingales. On a separate venture, I started reading Hastie's The Elements of Statistical Learning. As I never took a statistics course, I'm having a bit of trouble conciliating different definitions of conditional expectation.
On Hastie, it is claimed that the expected prediction error $$\text{EPE}(f) = E(Y-f(X))^2$$ can also be written as $$\text{EPE}(f) = E_X E_{Y | X}((Y-f(X))^2 | X). $$ Now, I'm not quite sure what the right hand side means. This seems to be an application of the tower property $$\text{E}(X | \mathcal{H}) = \text{E}(\text{E}(X | \mathcal{G}) | \mathcal{H}), $$ where $\mathcal{G}$ is a sub $\sigma$-algebra of $\mathcal{H}$, but I'm not quite able to translate everything back and forth.
To be concrete: what does $\text{E}_X$ and $\text{E}_{Y|X}$ mean, and how can I phrase the identity above for the EPE in terms of the modern definition of conditional expectation?