1
$\begingroup$

Tl;dr: From what property can we conclude that that $E[E[X|Y] * f(Y)] = E[X f(Y)]$?


I'm working through Sutton and Barto's Reinforcement Learning textbook: https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf

Page 61 of the PDF (page 40 of the text) effectively says that for two random variables $R_t$ and $A_t$,

$$ E\left[E[R_t | A_t] \frac{\partial \pi_{t} (A_t)}{\partial H_{t} (a)} \frac{1}{\pi_t (A_t)} \right] = E \left[ R_t \frac{\partial \pi_{t} (A_t)}{\partial H_{t} (a)} \frac{1}{\pi_t (A_t)} \right] $$

where $a$ is a particular value of the random variable $A_t$.

I can't figure out what property allows us to conclude that. By the law of iterated expectation, $E[E[X|Y]] = E[X]$. But how exactly does it follow that $E[E[X|Y] * f(Y)] = E[X f(Y)]$?

$E[X] = E[Y]$ does not imply that $X = Y$, so $E[E[X|Y]] = E[X]$ shouldn't imply that $E[X|Y] = X$.

And as far as I can tell, $E[XY] = E[X] * E[Y]$ is true only when $X$ and $Y$ are independent, so we can't say that $E[E[X|Y] * f(Y)] = E[E[X|Y]] * E[f(Y)] = E[X] * E[f(Y)] = E[Xf(Y)]$. (Also note that $X$ here indicates the reward $R_t$, which is not independent of the action selection $A_t$.)

Edit: Looks like something pretty close is proved at the bottom of this document: https://webspace.maths.qmul.ac.uk/i.goldsheid/MTH5118/Notes5-09.pdf

$\endgroup$
1
  • $\begingroup$ Hi: just the law of iterated expectation on the first term and leave f(Y) alone. So, E[E[X|Y] = E[X] . I may not be understanding your question ? $\endgroup$
    – mark leeds
    Commented Sep 23, 2022 at 4:17

1 Answer 1

1
$\begingroup$

$E[E[X|Y] * f(Y)] =E[ E[X f(Y)]|Y]=E[X f(Y)]$. The first equality comes from the fact that $E[X|Y] * f(Y) =E[X f(Y)]|Y]$: You can pull $f(Y)$ inside the conditional expectation since $f(Y)$ is already measurable w.r.t. $\sigma (Y)$. The second equality is the tower property of conditional expectation.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .