I am self-learning introductory stochastic calculus from the text A first course in Stochastic Calculus, by Louis Pierre Arguin. I'm struggling to understand a particular step in the proof, and I would like to ask for some help on it. I've tried to search the same proof online, but I had a hard time following them.
Theorem.(Existence and uniqueness of the conditional expectation) Let $X$ be a random variable on $(\Omega,\mathcal{F},\mathbb{P})$. Let $Y$ be a random variable in $L^{2}(\Omega,\mathcal{F},\mathbb{P})$. Then the conditional expectation $\mathbb{E}[Y|X]$ is the random variable $Y^{\star}$ given in the equation (A). Namely, it is the random variable in $L^{2}(\Omega,\sigma(X),\mathbb{P})$ that is closest to $Y$ in the $L^{2}$-distance. In particular, we have:
It is the orthogonal projection of $Y$ onto $L^{2}(\Omega,\sigma(X),\mathbb{P})$, that is $Y-Y^{\star}$ is orthogonal to any random variables in the subspace $L^{2}(\Omega,\sigma(X),\mathbb{P})$.
It is unique.
Remark. This result reinforces the meaning of the conditional expectation $\mathbb{E}[Y|X]$ as the best estimation of $Y$ given the information of $X$: it is the closest random variable to $Y$ among all the functions of $X$ in the sense of $L^{2}$.
So, $Y^{\star}$ is such that:
$$\inf_{Z\in L^2(X)}\mathbb{E}[(Y-Z)^2] = \mathbb{E}[(Y-Y^{\star})^2] \tag{1}$$
Proof.
We write for short $L^{2}(X)$ for the subspace $L^{2}(\Omega,\sigma(X),\mathbb{P})$. Let $Y^{\star}$ be as in equation (A). We show successively that (1) $Y-Y^{\star}$ is orthogonal to any element of $L^{2}(X)$, so it is the orthogonal projection (2) $Y^{\star}$has the properties of conditional expectation in definition (3) $Y^{\star}$ is unique.
(1) Let $W=g(X)$ be a random variable in $L^{2}(X)$. We show that $W$ is orthogonal to $Y-Y^{\star}$; that is $\mathbb{E}[(Y-Y^{\star})W]=0$. This should be intuitively clear from the figure. On the one hand, we have by developing the square:
\begin{align*} \mathbb{E}[(W-(Y-Y^{\star}))^{2}] & =\mathbb{E}[W^{2}-2W(Y-Y^{\star})+(Y-Y^{\star})^{2}]\\ & =\mathbb{E}[W^{2}]-2\mathbb{E}[W(Y-Y^{\star})]+\mathbb{E}(Y-Y^{\star})^{2}] \tag{2} \end{align*}
On the other hand, $Y^{\star}+W$ is in $L^{2}(X)$(it is a linear combination of the elements in $L^{2}(X)$), we must have from equation (1):
$$\mathbb{E}[(W-(Y-Y^{\star}))^2] \geq \mathbb{E}[(Y-Y^{\star})^2]\tag{3}$$
I simply don't follow, how this last inequality (3) is arrived at.
Putting the last two equations together, we get that for any $W \in L^2(X)$,
$$\mathbb{E}[W^2]-2\mathbb{E}[W(Y-Y^{\star})]\geq 0 \tag{4}$$
In particular, this also holds for $aW$, in which case we get:
\begin{align*} a^2 \mathbb{E}[W^2] - 2a\mathbb{E}[W(Y-Y^{\star})^2] &\geq 0\\ a \{\mathbb{E}[W^2] - 2a\mathbb{E}[W(Y-Y^{\star})^2]\} \geq 0 \tag{5} \end{align*}
If $a > 0$, then :
$$a\mathbb{E}[W^2] - 2\mathbb{E}[W(Y-Y^{\star})^2] \geq 0 \tag{6a}$$
whereas if $a < 0$, then:
$$a\mathbb{E}[W^2] - 2\mathbb{E}[W(Y-Y^{\star})^2] \leq 0 \tag{6b}$$
Re-arranging (6a) yields:
$$\mathbb{E}[W(Y-Y^{\star})^2] \leq a\mathbb{E}[W^2]/2 \tag{7a}$$
and rearranging (6b) yields:
$$\mathbb{E}[W(Y-Y^{\star})^2] \geq a\mathbb{E}[W^2]/2 \tag{7b}$$
Since (7a) holds for all $a > 0$, it follows that, $\mathbb{E}[W(Y-Y^{\star})^2] \leq 0$. Since (7b) holds for all $a < 0$, $\mathbb{E}[W(Y-Y^{\star})^2] \geq 0$. Consequently,
$$\mathbb{E}[W(Y-Y^{\star})^2] = 0$$