1
$\begingroup$

When studying Markov processes, I have seen a lot of authors define the semigroup as $P_tf(x) = \mathbb E_x(f(X_t))$ (with the assumption that $X_t$ is homogeneous) and the call $\mathbb E_x$ as the "expectation given $X_0=x$", i.e, they mean

$$\mathbb E_x(\cdot) = \mathbb E[\cdot|X_0=x],$$

and I couldn't find a rigorous definition of this because if $X_0$ is an absolute continuous variable then the right-hand side wouldn't work in the usual sense (dividing $\mathbb P(X_0=x)$). However, I also notice that some authors avoid defining this conditional law by starting out with "Markov kernels" associated with a Markov process $(X_t)$, which totally makes sense to me. I'm okay with the latter approach although there are things that I'm not fully understanding right now but I will reserve it for another post.

In addition, some even define $X=(X_t)_{t\geq 0}$ to be homegeneous iff for every bounded measurable set $\Gamma$ (in a metric space where $X_t$ takes value) we have

$$ \mathbb P(X_t \in \Gamma | X_s) = \mathbb P (X_{t+u} \in \Gamma|X_{s+u}), \quad \forall u >0. $$

First question: Without mentioning anything else, should I think interpret this equality as almost surely and the left handside is $\sigma(X_{s+u})$ measurable and is a version of $\mathbb P (X_{t+u} \in \Gamma|X_{s+u})$?

Second question: I'm looking for the rigorous definition of $\mathbb E[\cdot|X_0=x]$ above, I believe that it should take a deterministic value for $P_tf(x)$ to make sense.

Any rigorous reference related to this is highly appreciated. Thank you for your help!

$\endgroup$
5
  • $\begingroup$ Quite often Markov chains are studied on finite or countably infinite state spaces, in which case conditional expectations are simpler. Else you can view conditioning on $X_0$ in the same way as any $E[Y|Z]$ calculation for random variable $Z$, it is $g(Z)$ for some Borel measurable function $g:\mathbb{R}\rightarrow\mathbb{R}$, and we can interpret $E[Y|Z=z]$ as $g(z)$. The versions and almost sure interpretations hold the same way, so for any other version $\tilde{g}(Z)$ we have $P[g(Z)=\tilde{g}(Z)]=1$. We can use $E[1_{\{X_t \in \Gamma\}}|X_s] = P[X_t \in \Gamma|X_s]$. $\endgroup$
    – Michael
    Commented Jun 27, 2023 at 21:47
  • 1
    $\begingroup$ PS: I don't know why someone downvoted...I upvoted to counteract it! =) $\endgroup$
    – Michael
    Commented Jun 27, 2023 at 21:53
  • $\begingroup$ Then, you are correct that $$P[X_t\in \Gamma|X_s] = P[X_{t+u}\in \Gamma|X_{s+u}]$$ does not quite make sense since the left-hand-side is a function of $X_s$ while the right-hand-side is a function of $X_{s+u}$. This "equality" is intended to mean $P[X_{t+u}\in \Gamma|X_{s+u}]=g(X_{s+u})$ for all $u \geq 0$, or more informally $P[X_{t+u}\in \Gamma|X_{s+u}=x]=P[X_t\in \Gamma|X_s=x]$ "for all $x$." $\endgroup$
    – Michael
    Commented Jun 27, 2023 at 22:05
  • $\begingroup$ Thank you Prof Neely! I appreciate your upvote a lot! However, I don't get the last part. Why should the equality holds "for all $x$"? Didn't you say that we can have another version $\tilde g$ of $g$ as long as it satisfies $P(g(Z) = \tilde g(Z)) = 1$? If $Z$ does not take all the values in the image space then $g$ and $\tilde g$ can differ on some big set of $\mathbb R$ and so $g(x) = h(x)$ in the last part could also mean $\tilde g(x) = h(x)$.... or even $\tilde g(x) = \tilde h(x)$? $\endgroup$ Commented Jun 27, 2023 at 22:50
  • $\begingroup$ Yes I was being informal there, I give a summary with some more details in my answer. $\endgroup$
    – Michael
    Commented Jun 27, 2023 at 23:05

1 Answer 1

1
$\begingroup$

The conditional probabilities can be defined as conditional expectations of an indicator function: Assuming $\Gamma$ is a Borel measurable subset of $\mathbb{R}$ and $X_t, X_{t+u}, X_s, X_{s+u}$ are random variables, we define \begin{align} P[X_t \in \Gamma|X_s] &= E[1_{\{X_t\in \Gamma\}}|X_s]\\ P[X_{t+u}\in \Gamma|X_{s+u}] &= E[1_{\{X_{t+u}\in \Gamma\}}|X_{s+u}] \end{align}

Then, you are correct that the following equality does not make sense: $$ P[X_t \in \Gamma | X_s] = P[X_{t+u} \in \Gamma|X_{s+u}] \quad \forall u \geq 0 \quad (*) $$ That is because $P[X_t \in \Gamma | X_s]$ is a function of random variable $X_s$, while $P[X_{t+u}\in \Gamma|X_{s+u}]$ is a function of random variable $X_{s+u}$. If $u>0$ then, quite likely, these have different values and it does not make sense to claim any almost-sure equality.

A corrected statement of (*) is this: For any Borel measurable function $g:\mathbb{R}\rightarrow\mathbb{R}$ such that $g(X_s)$ is a version of $P[X_t\in \Gamma|X_s]$, we have for every $u\geq 0$ that $g(X_{s+u})$ is a version of $P[X_{t+u}\in \Gamma|X_{s+u}]$.


In general, if $Y$ is a random variable and $W$ is a random variable with $E[W^2]<\infty$, we can interpret $E[W|Y=y]$ as follows: Choose any Borel measurable $g:\mathbb{R}\rightarrow\mathbb{R}$ for which $g(Y)$ is a version of $E[W|Y]$ and then define: $$ E[W|Y=y] := g(y) \quad \forall y \in \mathbb{R}$$ This definition is not unique because we might have chosen some other Borel measurable function $\tilde{g}:\mathbb{R}\rightarrow\mathbb{R}$ such that $\tilde{g}(Y)$ is a version of $E[W|Y]$. We are not guaranteed that $g(y)=\tilde{g}(y)$ for all $y \in \mathbb{R}$. However we are guaranteed that when comparing these two versions, we have $$ P[g(Y)=\tilde{g}(Y)]=1$$ Or in other words if we define $$ A = \{y \in \mathbb{R}:g(y)\neq \tilde{g}(y)\}$$ Then $A$ is a Borel measurable subset of $\mathbb{R}$ and $\mu_{F_Y}(A)=0$, where $\mu_{F_Y}$ is the measure induced by the CDF of $Y$: $$ \mu_{F_Y}(D)=P[Y\in D]\quad \forall D \in \mathcal{B}(\mathbb{R})$$

$\endgroup$
3
  • $\begingroup$ Thank you very much! I understand much better now! I have one minor question if you don't mind... How should I interpret the map $x \mapsto P_tf(x)$? By definition, $P_tf(x) = E(f(X_t)|X_0 =x)$ and there are plenty of $g$'s such that $g(X_0)$ is a version of $E(f(X_t)|X_0)$ so if we take $P_tf(x) = g(x)$ then for other value $y\neq x$, still $P_t(y) = g(y)$? $\endgroup$ Commented Jun 27, 2023 at 23:36
  • 1
    $\begingroup$ Yes, for fixed $t$, just choose any measurable $g_t:\mathbb{R}\rightarrow\mathbb{R}$ for which $g_t(X_0)$ is a version of $E[f(X_t)|X_0]$. Then you can use $E[f(X_t)|X_0=x]=g_t(x)$ for all $x \in \mathbb{R}$. It turns out that if $f(X_t)$ is always in $[0,1]$ we can restrict attention to measurable functions $g_t:\mathbb{R}\rightarrow[0,1]$, though this takes some effort to prove. $\endgroup$
    – Michael
    Commented Jun 27, 2023 at 23:43
  • $\begingroup$ To state the homogeneous condition of Markov chains it is best to use Markov kernels, since then we are not tied to a particular random variable $X_0$ with some particular distribution. $\endgroup$
    – Michael
    Commented Jun 27, 2023 at 23:58

Not the answer you're looking for? Browse other questions tagged .