Property of Markov chains when conditioning on zero-probability event

Question

I have a bit a formal problem with a property of the expected win rate expressed with Markov chains.

Let $(X_i)_{i \geq 0}$ be a (countably finite), time-discrete stochastic process that has the elementary Markov property with respect to $\mathbb{P}$ and is homogenous, so:

$$\mathbb{P}(X_n = i_n|X_{n-1} = i_{n-1}, \ldots, X_0 = i_0) = \mathbb{P}(X_{n} = i_{n}|X_{n-1} = i_{n-1}) = \mathbb{P}(X_{1} = i_{1}|X_{0} = i_{0}) =: p_{i_0 i_1}$$

This must only hold whenever we can condition on the past states without division by zero. We then talk about such a five-state Markov chain where the state $5$ is a winning condition and the state $2$ is a losing condition. We define

$$\tau_i := \min\{n \geq 0: X_n = i\}$$

as the random variable that yields the first hit of state $i$ (including the starting time). The winning condition with start in state $i$ is thus

$$g_i := \mathbb{P}(\tau_5 < \tau_2 | X_0 = i)$$

The state $1$ has transition probabilities $p_{12} = \frac{1}{2} = p_{13}$, meaning that

$$\begin{array}{lcl} g_1 &=& \mathbb{P}(\tau_5 < \tau_2|X_0 = 1) \\ &=& \frac{\mathbb{P}(\tau_5 < \tau_2,X_0 = 1)}{\mathbb{P}(X_0 = 1)} \\ &=& \frac{\mathbb{P}(\tau_5 < \tau_2,X_0 = 1,X_1 = 2)}{\mathbb{P}(X_0 = 1)} + \frac{\mathbb{P}(\tau_5 < \tau_2,X_0 = 1, X_1 = 3)}{\mathbb{P}(X_0 = 1)} \\ &=& \frac{\mathbb{P}(\tau_5 < \tau_2,X_0 = 1,X_1 = 2)}{\mathbb{P}(X_0 = 1,X_1 = 2)}\frac{\mathbb{P}(X_0 = 1, X_1 = 2)}{\mathbb{P}(X_0 = 1)} + \frac{\mathbb{P}(\tau_5 < \tau_2,X_0 = 1, X_1 = 3)}{\mathbb{P}(X_0 = 1,X_1 = 3)}\frac{\mathbb{P}(X_0 = 1, X_1 = 3)}{\mathbb{P}(X_0 = 1)} \\ &=& \mathbb{P}(\tau_5 < \tau_2|X_0 = 1, X_1 = 2) \mathbb{P}(X_1 = 2| X_0 = 1) + \mathbb{P}(\tau_5 < \tau_2|X_0 = 1, X_1 = 3) \mathbb{P}(X_1 = 3| X_0 = 1) \\ &=& g_2 p_{12} + g_3 p_{13} \end{array}$$

The fact that is tripping me up is not that we can use homogeneity and the Markov property for $\tau_5 < \tau_2$ on the left side (I have already proven that), but rather:

Does the above property still hold if our starting probability (here $\mathbb{P}(X_0 = 1)$) is zero? Can we justify the questionable algebra with division by zero?

The reason I am asking for this special case to be "proven" is that right after this proof, we use this to solve the system generated by these equations (excluding the absorption states).

My guess to resolve this problem is that we could define the conditional probability to be $0$ in that case, which aligns with the intution that $g_1 = 0$ but... that would just make other steps in the above derivation questionable, if I am not mistaken. More generally:

How does one circumvent the problem of having to deal with ill-defined probabilities when proving properties of certain Markov chains? Is there a generalized approach to this problem?

Thank you for your attention!

SBF · Accepted Answer · 2022-03-13 10:25:58Z

One should note that $\Bbb P(A|B)$ is only defined for events $B$ of positive probability. Yet, there are a lot of generalizations of this rather simple idea that allow incorporating conditioning on zero-probability events as well. Namely, given a measure $\Bbb P$ you may want to have some kind of conditional kernel it induces, namely a function $k(x, A)$ that satisfies $$ \Bbb P(X_0 \in B, X_1\in A) = \int_Bk(x,A)\Bbb P_{X_0}(\mathrm dx) \tag{1} $$ for every measurable sets $A$ and $B$. Here $\Bbb P_{X_0}$ is the distribution law of $X_0$. Note that $(1)$ does not have to determine $k$ uniquely: if that happens that $\Bbb P_{X_0}(B) = 0$ you can define $k$ there to be anything you want it to be, and it will satisfy $(1)$. That would not matter to you, since you always will use $k$ only as an integrand w.r.t. the measure $\Bbb P_{X_0}$. It's like with all a.s. uniqueness: no one cares what happens on sets of zero probability, as that does not matter for all "practical" purposes. Note also that in case of discrete sets $(1)$ is quite easy to solve (how? (hint: use singleton sets)), whereas in general case such $k$ may not exist at all: there is a whole bunch of so-called disintegration theorems that provide conditions for the existence of $k$.

That sounds like an interesting generalization, I'll look into it. But: Do you really think that "no one cares"? The fact that conditional probabilities are used left and right in discrete Markov chains and that not every transition probability or state sequence will have positive probability (think strictly periodic chains) makes me uncertain. Formulated another way: Why exactly do we assume that these properties do also hold for these edge cases most of the time if the math does not check out the same way? — TheOutZ, Commented Mar 13, 2022 at 10:36
If $B$ has positive probability, you have uniqueness of conditional probability. If it does not, you won't every notice you chose wrong @TheOutZ. notice that one could define the conditional probability $p = P(A|B)$ being a solution of the following equation $$ P(A \text{ and }B) = p\cdot P(B) \qquad (1') $$ which has a unique solution (you're used to) if $P(B) > 0$ and uniqueness is lost if $P(B) = 0$. Does $(1')$ remind you of $(1)$? — SBF, Commented Mar 13, 2022 at 11:28

Stack Exchange Network

Property of Markov chains when conditioning on zero-probability event

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
markov-chains
conditional-probability
.

Hot Network Questions

Property of Markov chains when conditioning on zero-probability event

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged markov-chainsconditional-probability.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
markov-chains
conditional-probability
.