
When studying Markov processes, I have seen a lot of authors define the semigroup as $P_tf(x) = \mathbb E_x(f(X_t))$ (with the assumption that $X_t$ is homogeneous) and the call $\mathbb E_x$ as the "expectation given $X_0=x$", i.e, they mean

$$\mathbb E_x(\cdot) = \mathbb E[\cdot|X_0=x],$$

and I couldn't find a rigorous definition of this because if $X_0$ is an absolute continuous variable then the right-hand side wouldn't work in the usual sense (dividing $\mathbb P(X_0=x)$). However, I also notice that some authors avoid defining this conditional law by starting out with "Markov kernels" associated with a Markov process $(X_t)$, which totally makes sense to me. I'm okay with the latter approach although there are things that I'm not fully understanding right now but I will reserve it for another post.

In addition, some even define $X=(X_t)_{t\geq 0}$ to be homegeneous iff for every bounded measurable set $\Gamma$ (in a metric space where $X_t$ takes value) we have

$$ \mathbb P(X_t \in \Gamma | X_s) = \mathbb P (X_{t+u} \in \Gamma|X_{s+u}), \quad \forall u >0. $$

First question: Without mentioning anything else, should I think interpret this equality as almost surely and the left handside is $\sigma(X_{s+u})$ measurable and is a version of $\mathbb P (X_{t+u} \in \Gamma|X_{s+u})$?

Second question: I'm looking for the rigorous definition of $\mathbb E[\cdot|X_0=x]$ above, I believe that it should take a deterministic value for $P_tf(x)$ to make sense.

Any rigorous reference related to this is highly appreciated. Thank you for your help!

  Quite often Markov chains are studied on finite or countably infinite state spaces, in which case conditional expectations are simpler. Else you can view conditioning on $X_0$ in the same way as any $E[Y|Z]$ calculation for random variable $Z$, it is $g(Z)$ for some Borel measurable function $g:\mathbb{R}\rightarrow\mathbb{R}$, and we can interpret $E[Y|Z=z]$ as $g(z)$. The versions and almost sure interpretations hold the same way, so for any other version $\tilde{g}(Z)$ we have $P[g(Z)=\tilde{g}(Z)]=1$. We can use $E[1_{\{X_t \in \Gamma\}}|X_s] = P[X_t \in \Gamma|X_s]$.
    – Michael
    Commented Jun 27, 2023 at 21:47
    – Michael
    Commented Jun 27, 2023 at 21:53
  Then, you are correct that $$P[X_t\in \Gamma|X_s] = P[X_{t+u}\in \Gamma|X_{s+u}]$$ does not quite make sense since the left-hand-side is a function of $X_s$ while the right-hand-side is a function of $X_{s+u}$. This "equality" is intended to mean $P[X_{t+u}\in \Gamma|X_{s+u}]=g(X_{s+u})$ for all $u \geq 0$, or more informally $P[X_{t+u}\in \Gamma|X_{s+u}=x]=P[X_t\in \Gamma|X_s=x]$ "for all $x$."
    – Michael
    Commented Jun 27, 2023 at 22:05
  Thank you Prof Neely! I appreciate your upvote a lot! However, I don't get the last part. Why should the equality holds "for all $x$"? Didn't you say that we can have another version $\tilde g$ of $g$ as long as it satisfies $P(g(Z) = \tilde g(Z)) = 1$? If $Z$ does not take all the values in the image space then $g$ and $\tilde g$ can differ on some big set of $\mathbb R$ and so $g(x) = h(x)$ in the last part could also mean $\tilde g(x) = h(x)$.... or even $\tilde g(x) = \tilde h(x)$?
  Yes I was being informal there, I give a summary with some more details in my answer.
    – Michael
    Commented Jun 27, 2023 at 23:05

1 Answer 1


The conditional probabilities can be defined as conditional expectations of an indicator function: Assuming $\Gamma$ is a Borel measurable subset of $\mathbb{R}$ and $X_t, X_{t+u}, X_s, X_{s+u}$ are random variables, we define \begin{align} P[X_t \in \Gamma|X_s] &= E[1_{\{X_t\in \Gamma\}}|X_s]\\ P[X_{t+u}\in \Gamma|X_{s+u}] &= E[1_{\{X_{t+u}\in \Gamma\}}|X_{s+u}] \end{align}

Then, you are correct that the following equality does not make sense: $$ P[X_t \in \Gamma | X_s] = P[X_{t+u} \in \Gamma|X_{s+u}] \quad \forall u \geq 0 \quad (*) $$ That is because $P[X_t \in \Gamma | X_s]$ is a function of random variable $X_s$, while $P[X_{t+u}\in \Gamma|X_{s+u}]$ is a function of random variable $X_{s+u}$. If $u>0$ then, quite likely, these have different values and it does not make sense to claim any almost-sure equality.

A corrected statement of (*) is this: For any Borel measurable function $g:\mathbb{R}\rightarrow\mathbb{R}$ such that $g(X_s)$ is a version of $P[X_t\in \Gamma|X_s]$, we have for every $u\geq 0$ that $g(X_{s+u})$ is a version of $P[X_{t+u}\in \Gamma|X_{s+u}]$.

In general, if $Y$ is a random variable and $W$ is a random variable with $E[W^2]<\infty$, we can interpret $E[W|Y=y]$ as follows: Choose any Borel measurable $g:\mathbb{R}\rightarrow\mathbb{R}$ for which $g(Y)$ is a version of $E[W|Y]$ and then define: $$ E[W|Y=y] := g(y) \quad \forall y \in \mathbb{R}$$ This definition is not unique because we might have chosen some other Borel measurable function $\tilde{g}:\mathbb{R}\rightarrow\mathbb{R}$ such that $\tilde{g}(Y)$ is a version of $E[W|Y]$. We are not guaranteed that $g(y)=\tilde{g}(y)$ for all $y \in \mathbb{R}$. However we are guaranteed that when comparing these two versions, we have $$ P[g(Y)=\tilde{g}(Y)]=1$$ Or in other words if we define $$ A = \{y \in \mathbb{R}:g(y)\neq \tilde{g}(y)\}$$ Then $A$ is a Borel measurable subset of $\mathbb{R}$ and $\mu_{F_Y}(A)=0$, where $\mu_{F_Y}$ is the measure induced by the CDF of $Y$: $$ \mu_{F_Y}(D)=P[Y\in D]\quad \forall D \in \mathcal{B}(\mathbb{R})$$

  Thank you very much! I understand much better now! I have one minor question if you don't mind... How should I interpret the map $x \mapsto P_tf(x)$? By definition, $P_tf(x) = E(f(X_t)|X_0 =x)$ and there are plenty of $g$'s such that $g(X_0)$ is a version of $E(f(X_t)|X_0)$ so if we take $P_tf(x) = g(x)$ then for other value $y\neq x$, still $P_t(y) = g(y)$?
  • 1
    Yes, for fixed $t$, just choose any measurable $g_t:\mathbb{R}\rightarrow\mathbb{R}$ for which $g_t(X_0)$ is a version of $E[f(X_t)|X_0]$. Then you can use $E[f(X_t)|X_0=x]=g_t(x)$ for all $x \in \mathbb{R}$. It turns out that if $f(X_t)$ is always in $[0,1]$ we can restrict attention to measurable functions $g_t:\mathbb{R}\rightarrow[0,1]$, though this takes some effort to prove.
    – Michael
    Commented Jun 27, 2023 at 23:43
  To state the homogeneous condition of Markov chains it is best to use Markov kernels, since then we are not tied to a particular random variable $X_0$ with some particular distribution.
    – Michael
    Commented Jun 27, 2023 at 23:58

