1
$\begingroup$

Let

  • $(E,\mathcal E,\lambda)$ be a measure space;
  • $p:E\to[0,\infty)$ be $\mathcal E$-measurable with $$c:=\int p\:{\rm d}\lambda\in(0,\infty)$$ and $\mu$ denote the measure with density $\frac pc$ with respect to $\lambda$;
  • $q:E^2\to[0,\infty)$ be $\mathcal E{\otimes2}$-measurable with $$c_x:=\int q(x,\;\cdot\;)\:{\rm d}\lambda\in(0,\infty)\;\;\;\text{for all }x\in E$$ and $Q(x,\;\cdot\;)$ denote the measure with density $\frac{q(x,\;\cdot\;)}{c_x}$ with respect to $\lambda$;
  • $$\alpha(x,y):=\left.\begin{cases}\displaystyle\min\left(1,\frac{p(y)q(y,x)}{p(x)q(x,y)}\right)&\text{, if }p(x)q(x,y)\ne0\\1&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }x,y\in E;$$
  • $$r(x):=1-\int Q(x,{\rm d}y)\alpha(x,y)\;\;\;\text{for }x\in E;$$
  • $\delta_x$ denote the Dirac measure on $(E,\mathcal E)$ at $x\in E$ and $$\kappa(x,B):=\int_BQ(x,{\rm d}y)\alpha(x,y)+r(x)\delta_x(B)\;\;\;\text{for }(x,B)\in E\times\mathcal E;$$
  • $(\Omega,\mathcal A,\operatorname P)$ be a probability space;
  • $(X_n)_{n\in\mathbb N_0}$ and $(Y_n)_{n\in\mathbb N}$ denote the Markov chain and proposal sequence generated by the Metropolis-Hastings algorithm with proposal kernel $Q$ and target distribution $\mu$, respectively, and $$Z_n:=(X_{n-1},Y_n)\;\;\;\text{for }n\in\mathbb N.$$

By construction, there is a $[0,1)$-valued independent identically $\mathcal U_{[0,\:1)}$-distributed process $(U_n)_{n\in\mathbb N}$ on $(\Omega,\mathcal A,\operatorname P)$ with

  1. $(U_1,\ldots,U_n)$ and $(Z_1,\ldots,Z_n)$ are independent;
  2. $$X_n=\begin{cases}Y_n&\text{if }U_n\le\alpha(Z_n);\\X_{n-1}&\text{otherwise}\end{cases}$$

for all $n\in\mathbb N$. Now, let $$A_n:=\left\{U_1>\alpha(Z_1),\ldots,U_{n-1}>\alpha(Z_{n-1}),U_n\le\alpha(Z_n)\right\}\;\;\;\text{for }n\in\mathbb N.$$ It should hold $$\operatorname P\left[A_n\mid Z_1,\ldots,Z_n\right]=\prod_{i=1}^n\left(1-\alpha(Z_i)\right)\alpha(Z_n)\tag1$$ for all $n\in\mathbb N$. Let $\tau_0:=0$ and $$\tau_k:=\inf\{n>\tau_{k-1}:U_n\le\alpha(Z_n)\}\;\;\;\text{for }k\in\mathbb N.$$ Moreover, let $$\tilde X_k:=\left.\begin{cases}X_{\tau_k}&\text{, if }\tau_k<\infty;\\X_{\tau_{k-1}}&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }k\in\mathbb N_0.$$ Let $B_0,B_1\in\mathcal E$. Note that \begin{equation}\begin{split}&\operatorname P\left[\tilde X_0\in B_0,\tilde X_1\in B_1\right]\\&\;\;\;\;=\sum_{n\in\mathbb N}\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]+\operatorname P\left[X_0\in B_0\cap B_1,\tau_1=\infty\right].\end{split}\tag2\end{equation}

Now, I would really like to conclude that \begin{equation}\begin{split}&\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]\\&\;\;\;\;=\operatorname E\left[\prod_{i=1}^{n-1}\left(1-\alpha(X_0,Y_i)\right)\alpha(X_0,Y_n);(X_0,Y_n)\in B_0\times B_1)\right]\\&\;\;\;\;=\left(\int\operatorname P\left[X_0\in{\rm d}x\right]\int Q(x,{\rm d}y)\left(1-\alpha(x,y)\right)\right)^{n-1}\int_{B_0}\operatorname P\left[X_0\in{\rm d}x\right]\int_{B_1}Q(x,{\rm d}y)\alpha(x,y)\end{split}\tag3\end{equation} for all $n\in\mathbb N$. However, while the first equality should hold by $(2)$, I don't get how I can rigorously justify the second equality (thought it is intuitively clear).

I think we need something like "$Y_1,\ldots,Y_n$ are conditionally independent given $X_0$ on $A_n$". The distribution of each $Y_i$ is $\mathcal L(X_{i-1})Q$. But on the right-hand side of the first equality of $(3)$, we somehow lost the information that $X_0=\cdots=X_{n-1}$ on $A_n$. Did I made a mistake before?

$\endgroup$
1
  • $\begingroup$ @Taylor It is simply a different notation for the restriction to a subset; so ist is equivalent to $1_{B_0\times B_1}(X_0,Y_n)$. $\endgroup$
    – 0xbadf00d
    Commented Jun 19, 2023 at 21:39

1 Answer 1

1
$\begingroup$

I don't think you can separate the integrals like that in the last equation. To see why, you could instead use the kernel on the augmented state space (call it $P$), then integrate directly using the finite-dimensional distribution $\mathsf{P}_{x_0} := \delta_{x_0} \otimes P^{\otimes n}$. Also, your equation (1) seems strange because the $Z$ sequence needs to depend on uniform variates.

Details:

Assuming WLOG that $q(\cdot, \cdot)$ is dominated by Lebesgue measure, the kernel on the augmented state space is $$ P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] = p(u_1)q(x_0, y_1) 1(u_1 \le a[x_0, y_1)]\delta_{y_1} (dx_1) du_1 dy_1 + p(u_1)q(x_0, y_1) 1(u_1 > a[x_0, y_1)]\delta_{x_0} (dx_1) du_1 dy_1, $$ In particular, if $x_0 \notin B$, then $$ P[ X_1 \in B \mid X_0 = x_0, Y_0 = y_0, U_0 = u_0] = \int_B q(x_0, y_1) a(x_0, y_1) $$

and $$ P[ X_1 =x_0 \mid X_0 = x_0, Y_0 = y_0, U_0 = u_0] = r(x_0). $$

So \begin{align*} &\mathsf{P}_{x_0}\left[Z_n\in B_0\times B_1,A_n\right] \\ &= \mathsf{P}_{x_0}\left[X_{n-1} \in B_0, Y_n \in B_1 ,U_1>\alpha(X_0, Y_1),\ldots,U_{n-1}>\alpha(X_{n-2}, Y_{n-1}),U_n\le\alpha(X_{n-1}, Y_{n})\right] \\ &= \mathsf{P}_{x_0}\left[g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) 1(u_n \le \alpha(x_{n-1}, y_n)1(y_n \in B_1) \right] \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-1}, y_{n-1}, u_{n-1}), (dx_n, dy_n, du_n)]g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) 1(u_n \le \alpha(x_{n-1}, y_n)1(y_n \in B_1) \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-2}, y_{n-2}, u_{n-2}), (dx_{n-1}, dy_{n-1}, du_{n-1})]g(x_0,y_0,u_0, \ldots, x_{n-1},y_{n-1},u_{n-1}) \int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) \\ &= \int \cdots \int P[(x_0, y_0, u_0), (dx_1, dy_1, du_1)] \cdots P[(x_{n-3}, y_{n-3}, u_{n-3}), (dx_{n-2}, dy_{n-2}, du_{n-2})]\tilde{g}(x_0,y_0,u_0, \ldots, x_{n-2},y_{n-2},u_{n-2}) 1(x_{n-1} \in B_0) 1(u_{n-1} > \alpha(x_{n-2}, y_{n-1}) \int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) \end{align*}

In the last line, when you start integrating with respect to $x_{n-1}, y_{n-1}$ and $u_{n-1}$, then you're not just integrating those indicators. There's that $\int_{y_n \in B_1} q(x_{n-1}, y_n) \alpha(x_{n-1}, y_n) $ term in there, too.

Other notes:

  • "The first equality should hold by (2)." (2) is the first-entrance decomposition. You're using the weak Markov property here.
  • I'm not sure what your end goal is here, but you might also consider using the "induced kernel" (see defn. 3.3.7 from Markov Chains) and iterating expectations based on random times. The stopping times you could condition on would be the proposal acceptances.
$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .