Let
- $(E,\mathcal E,\lambda)$ be a measure space;
- $p:E\to[0,\infty)$ be $\mathcal E$-measurable with $$c:=\int p\:{\rm d}\lambda\in(0,\infty)$$ and $\mu$ denote the measure with density $\frac pc$ with respect to $\lambda$;
- $q:E^2\to[0,\infty)$ be $\mathcal E{\otimes2}$-measurable with $$c_x:=\int q(x,\;\cdot\;)\:{\rm d}\lambda\in(0,\infty)\;\;\;\text{for all }x\in E$$ and $Q(x,\;\cdot\;)$ denote the measure with density $\frac{q(x,\;\cdot\;)}{c_x}$ with respect to $\lambda$;
- $$\alpha(x,y):=\left.\begin{cases}\displaystyle\min\left(1,\frac{p(y)q(y,x)}{p(x)q(x,y)}\right)&\text{, if }p(x)q(x,y)\ne0\\1&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }x,y\in E;$$
- $$r(x):=1-\int Q(x,{\rm d}y)\alpha(x,y)\;\;\;\text{for }x\in E;$$
- $\delta_x$ denote the Dirac measure on $(E,\mathcal E)$ at $x\in E$ and $$\kappa(x,B):=\int_BQ(x,{\rm d}y)\alpha(x,y)+r(x)\delta_x(B)\;\;\;\text{for }(x,B)\in E\times\mathcal E;$$
- $(\Omega,\mathcal A,\operatorname P)$ be a probability space;
- $(X_n)_{n\in\mathbb N_0}$ and $(Y_n)_{n\in\mathbb N}$ denote the Markov chain and proposal sequence generated by the Metropolis-Hastings algorithm with proposal kernel $Q$ and target distribution $\mu$, respectively, and $$Z_n:=(X_{n-1},Y_n)\;\;\;\text{for }n\in\mathbb N.$$
By construction, there is a $[0,1)$-valued independent identically $\mathcal U_{[0,\:1)}$-distributed process $(U_n)_{n\in\mathbb N}$ on $(\Omega,\mathcal A,\operatorname P)$ with
- $(U_1,\ldots,U_n)$ and $(Z_1,\ldots,Z_n)$ are independent;
- $$X_n=\begin{cases}Y_n&\text{if }U_n\le\alpha(Z_n);\\X_{n-1}&\text{otherwise}\end{cases}$$
for all $n\in\mathbb N$. Now, let $$A_n:=\left\{U_1>\alpha(Z_1),\ldots,U_{n-1}>\alpha(Z_{n-1}),U_n\le\alpha(Z_n)\right\}\;\;\;\text{for }n\in\mathbb N.$$ It should hold $$\operatorname P\left[A_n\mid Z_1,\ldots,Z_n\right]=\prod_{i=1}^n\left(1-\alpha(Z_i)\right)\alpha(Z_n)\tag1$$ for all $n\in\mathbb N$. Let $\tau_0:=0$ and $$\tau_k:=\inf\{n>\tau_{k-1}:U_n\le\alpha(Z_n)\}\;\;\;\text{for }k\in\mathbb N.$$ Moreover, let $$\tilde X_k:=\left.\begin{cases}X_{\tau_k}&\text{, if }\tau_k<\infty;\\X_{\tau_{k-1}}&\text{, otherwise}\end{cases}\right\}\;\;\;\text{for }k\in\mathbb N_0.$$ Let $B_0,B_1\in\mathcal E$. Note that \begin{equation}\begin{split}&\operatorname P\left[\tilde X_0\in B_0,\tilde X_1\in B_1\right]\\&\;\;\;\;=\sum_{n\in\mathbb N}\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]+\operatorname P\left[X_0\in B_0\cap B_1,\tau_1=\infty\right].\end{split}\tag2\end{equation}
Now, I would really like to conclude that \begin{equation}\begin{split}&\operatorname P\left[Z_n\in B_0\times B_1,A_n\right]\\&\;\;\;\;=\operatorname E\left[\prod_{i=1}^{n-1}\left(1-\alpha(X_0,Y_i)\right)\alpha(X_0,Y_n);(X_0,Y_n)\in B_0\times B_1)\right]\\&\;\;\;\;=\left(\int\operatorname P\left[X_0\in{\rm d}x\right]\int Q(x,{\rm d}y)\left(1-\alpha(x,y)\right)\right)^{n-1}\int_{B_0}\operatorname P\left[X_0\in{\rm d}x\right]\int_{B_1}Q(x,{\rm d}y)\alpha(x,y)\end{split}\tag3\end{equation} for all $n\in\mathbb N$. However, while the first equality should hold by $(2)$, I don't get how I can rigorously justify the second equality (thought it is intuitively clear).
I think we need something like "$Y_1,\ldots,Y_n$ are conditionally independent given $X_0$ on $A_n$". The distribution of each $Y_i$ is $\mathcal L(X_{i-1})Q$. But on the right-hand side of the first equality of $(3)$, we somehow lost the information that $X_0=\cdots=X_{n-1}$ on $A_n$. Did I made a mistake before?