Tl;dr: From what property can we conclude that that $E[E[X|Y] * f(Y)] = E[X f(Y)]$?
I'm working through Sutton and Barto's Reinforcement Learning textbook: https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
Page 61 of the PDF (page 40 of the text) effectively says that for two random variables $R_t$ and $A_t$,
$$ E\left[E[R_t | A_t] \frac{\partial \pi_{t} (A_t)}{\partial H_{t} (a)} \frac{1}{\pi_t (A_t)} \right] = E \left[ R_t \frac{\partial \pi_{t} (A_t)}{\partial H_{t} (a)} \frac{1}{\pi_t (A_t)} \right] $$
where $a$ is a particular value of the random variable $A_t$.
I can't figure out what property allows us to conclude that. By the law of iterated expectation, $E[E[X|Y]] = E[X]$. But how exactly does it follow that $E[E[X|Y] * f(Y)] = E[X f(Y)]$?
$E[X] = E[Y]$ does not imply that $X = Y$, so $E[E[X|Y]] = E[X]$ shouldn't imply that $E[X|Y] = X$.
And as far as I can tell, $E[XY] = E[X] * E[Y]$ is true only when $X$ and $Y$ are independent, so we can't say that $E[E[X|Y] * f(Y)] = E[E[X|Y]] * E[f(Y)] = E[X] * E[f(Y)] = E[Xf(Y)]$. (Also note that $X$ here indicates the reward $R_t$, which is not independent of the action selection $A_t$.)
Edit: Looks like something pretty close is proved at the bottom of this document: https://webspace.maths.qmul.ac.uk/i.goldsheid/MTH5118/Notes5-09.pdf