Skip to main content
added 7 characters in body
Source Link
Jason Swanson
  • 3.4k
  • 19
  • 29

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A),\tag{1} $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{2} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (2). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.


Regarding your comment, you can use conditional densities, just as you would in an undergraduate setting where there is no measure theory. The only difference is, here, you are probably expected to justify everything rigorously. In that case, for the fourth question, we could do something like the following.

Let $\nu$ be a regular conditional distribution for $Y$ given $X$. We will use the following, which comes from Equation (1) above: $$ \lim_{\varepsilon \to 0} P(X \in B \mid Y \in (y - \varepsilon, y + \varepsilon)) = \nu(y, B).\tag{3} $$ We calculate that $$ P(X = n \mid Y \in (y - \varepsilon, y + \varepsilon)) = \frac{ b \int_{y - \varepsilon}^{y + \varepsilon} \frac{(au)^n}{n!}e^{-(a+b)u} \, du }{ b \int_{y - \varepsilon}^{y + \varepsilon} e^{-bu} \, du } \to \frac{(ay)^n}{n!}e^{-ay} $$ as $\varepsilon \to 0$. Therefore, $\nu(y, \cdot) = \sum_{n \in \mathbb{Z}_+} \frac{(ay)^n}{n!}e^{-ay} \delta_n$, where $\delta_n$ is the point-mass measure at $n$. In other words, $\nu(y, \cdot)$ is the Poisson distribution with parameter $ay$. From here, we obtain $$ E[X \mid Y] = \int_{\mathbb{R}} x \, \nu(Y, dx) = \sum_{n \in \mathbb{Z}_+} n \frac{(aY)^n}{n!}e^{-aY} = aY. $$ In a non-rigorous undergraduate class, all of this could be done very quickly by pushing around differentials and defining conditional expectations in terms of conditional densities. Presumably, you are expected to add more details, but the same mathematical ideas will get you to the answer.

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A),\tag{1} $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{2} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (2). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A),\tag{1} $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{2} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (2). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.


Regarding your comment, you can use conditional densities, just as you would in an undergraduate setting where there is no measure theory. The only difference is, here, you are probably expected to justify everything rigorously. In that case, for the fourth question, we could do something like the following.

Let $\nu$ be a regular conditional distribution for $Y$ given $X$. We will use the following, which comes from Equation (1) above: $$ \lim_{\varepsilon \to 0} P(X \in B \mid Y \in (y - \varepsilon, y + \varepsilon)) = \nu(y, B).\tag{3} $$ We calculate that $$ P(X = n \mid Y \in (y - \varepsilon, y + \varepsilon)) = \frac{ b \int_{y - \varepsilon}^{y + \varepsilon} \frac{(au)^n}{n!}e^{-(a+b)u} \, du }{ b \int_{y - \varepsilon}^{y + \varepsilon} e^{-bu} \, du } \to \frac{(ay)^n}{n!}e^{-ay} $$ as $\varepsilon \to 0$. Therefore, $\nu(y, \cdot) = \sum_{n \in \mathbb{Z}_+} \frac{(ay)^n}{n!}e^{-ay} \delta_n$, where $\delta_n$ is the point-mass measure at $n$. In other words, $\nu(y, \cdot)$ is the Poisson distribution with parameter $ay$. From here, we obtain $$ E[X \mid Y] = \int_{\mathbb{R}} x \, \nu(Y, dx) = \sum_{n \in \mathbb{Z}_+} n \frac{(aY)^n}{n!}e^{-aY} = aY. $$ In a non-rigorous undergraduate class, all of this could be done very quickly by pushing around differentials and defining conditional expectations in terms of conditional densities. Presumably, you are expected to add more details, but the same mathematical ideas will get you to the answer.

added 7 characters in body
Source Link
Jason Swanson
  • 3.4k
  • 19
  • 29

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A), $$$$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A),\tag{1} $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{1} $$$$ P(Y \in A \mid X = x) = \mu(x, A).\tag{2} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (12). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A), $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{1} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (1). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A),\tag{1} $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{2} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (2). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.

Source Link
Jason Swanson
  • 3.4k
  • 19
  • 29

For the second qusetion, I don't know how to determine it in a measure theoretic way, with elementary probability, I derive $$ \mathbb{P}(Y\in A|X=n)= \frac{\mathbb{P}(X=n, Y\in A)}{\mathbb{P}(X=n)} $$

after a short calculation I get the conditional distribution of $Y$ knowing $X$ is $\Gamma(X+1,a+b)$. How to use the language of transition kernel to interpret it?

In my opinion, the best way to understand the bridge between undergraduate conditional expectation and measure-theoretic conditional expectation is through regular conditional distributions.

In this setting, a probability kernel is a function $\mu: \mathbb{R} \times \mathcal{B}(\mathbb{R})$ such that $\mu(x, \cdot)$ is a measure for each $x$ and $\mu(\cdot, A)$ is measurable for each $A$. If $X$ and $Y$ are real-valued random variables, then there exists a kernel $\mu$ such that $$ P(Y \in A \mid X) = \mu(X, A) \text{ a.s.} $$ This kernel is unique in the sense that if $\tilde \mu$ is another such kernel, then $\mu(x, \cdot) = \tilde \mu(x, \cdot)$ for $\mu_X$-a.e. $x \in \mathbb{R}$, where $\mu_X$ is the distribution of $X$. Moreover, it has the property that $$ \lim_{\varepsilon \to 0} P(Y \in A \mid X \in (x - \varepsilon, x + \varepsilon)) = \mu(x, A), $$ for $\mu_X$-a.e. $x$. Also, if $P(X = x) > 0$, then $$ P(Y \in A \mid X = x) = \mu(x, A).\tag{1} $$ This kernel $\mu$ is called a regular conditional distribution for $Y$ given $X$. Given $\mu$, we obtain the conditional expectation by $E[Y \mid X] = \int_{\mathbb{R}} y \, \mu(X, dy)$.

In your case, you have already done the elementary calculations to compute (1). This gives you $\mu$, and with that, you can proceed to work in the measure-theoretic setting.

I am not familiar with Le Gall's book, but I suspect at least some, if not all, of these topics are covered there.