Skip to main content
2 of 3
added 7 characters in body
Jason Swanson
  • 3.4k
  • 19
  • 29

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A),\tag{1} $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{2} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (2). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.

Jason Swanson
  • 3.4k
  • 19
  • 29