Proof of continuous mapping theorem from Wiki

Question

I am concerning the proof of continuous mapping theorem for the convergence in probability (see link)

To make the post self-contained, I will provide the necessary definitions here.

Defn 1: Convergence in Probability

The sequence of random variabels $\{T_n\}_{n\geq 1}$ converges in probability to $T$ (possibly a random variable) if and only if \begin{align*} \forall \epsilon>0:\lim_{n\rightarrow\infty}(\|T_n-T\|\leq \epsilon)=1\\ \forall \epsilon>0:\lim_{n\rightarrow\infty}(\|T_n-T\|> \epsilon)=0. \end{align*} We denote this by $T_n\xrightarrow[]{p}T$, $\text{plim}_{n\rightarrow\infty}T_n=T$ or $T_n-T=o_p(1)$.

Defn 2: Continuity

Let $(X,d)$ and $(Y,\rho)$ be two metric spaces, and let $f:X\mapsto Y$ be a funciton. The function $f$ is called continuous at $x\in X$ if for every $\epsilon>0$ there exists $\delta>0$ such that $d(y,x)<\delta\implies \rho(f(y),f(x))<\epsilon$. If $f$ is continuous at all $x\in X$, then we say $f$ is continuous on $X$.

Theorem: Continuous Mapping Theorem

Let $\{X_n\}$, $X$ be random elements defined on a metric space $S$. Suppose a function $g:S\mapsto S'$ (where $S'$ is another metric space) has the set of discontinuity points $D_g$ such that $Pr(X\in D_g)=0$. Then \begin{align*} X_n\xrightarrow[]{(\cdot)}X\implies g(X_n)\xrightarrow[]{(\cdot)}g(X) \end{align*} where $\xrightarrow[]{(\cdot)}$ can be either convergence in probability or convergence in distribution or convergence almost surely.

Question

In the proof, it says

The second term converges to zero as $\delta\rightarrow 0$, since the set $B_\delta$ shrinks to an empty set.

I felt like this was a correct statement but was not the reason that $Pr(X\in B_\delta)=0$. Instead, the correct explanation should use the property of continuity. In paticular, since $Pr(|X_n-X|\geq \delta)\rightarrow 0$ for any value of $\delta$ and $g(\cdot)$ is continuous at $X$, for all $\epsilon>0$, there exists $\delta_\epsilon$ such that $|X_n-X|<\delta_\epsilon\implies|g(X_n)-g(X)|<\epsilon$. This means $Pr(|X_n-X|<\delta_\epsilon)\leq Pr(|g(X_n)-g(X)|<\epsilon)$.Since $\lim_{n\rightarrow\infty}Pr(|X_n-X|<\delta_\epsilon)=1$, we have $\lim_{n\rightarrow\infty}Pr(|g(X_n)-g(X)|<\epsilon)=1$ (I have essentially proved the claim here). This also means $\lim_{n\rightarrow\infty}Pr(|g(X_n)-g(X)|>\epsilon)=0$ and I suspect it's possible to use this fact to bound $\lim_{n\rightarrow \infty}Pr(x\in B_{\delta_\epsilon})$. Overall, I don't understand why the need to define this $B_\delta$ object.

I have edited the last bit as the previous use of notation is inaccurate.

Nicolas Agote · Accepted Answer · 2022-10-03 10:49:25Z

A way to understand the proof's assertion

I found the Wikipedia article gave an explanation of the fact before using it, where it stated that indeed because of continuity the sets $B_\delta$ would shrink into the empty set (as you noted, there is no reason to suppose this property would be true for any function so we should use the continuity hypothesis):

Fix an arbitrary ε > 0. Then for any δ > 0 consider the set $B_δ$ defined as: $$B_\delta = \{x \in S | \, x\notin D_g : \exists y\in S : |x-y|<\delta , |g(x)-g(y)|>\varepsilon\}$$ This is the set of continuity points x of the function g(·) for which it is possible to find, within the δ-neighborhood of x, a point which maps outside the ε-neighborhood of g(x). By definition of continuity, this set shrinks as δ goes to zero, so that $\lim_{δ\to 0} B_δ = ∅$.

The last sentence gives the clue as to what was intended.

Note first that $B_\delta \subseteq B_\eta$ if $0<\delta <\eta$, so the family $(B_\delta)_\delta$ is decreasing. Now, let's show that the probabilities of these sets go to zero as $\delta$ goes to zero.

For a fixed $\varepsilon>0$ and for every $x\notin D_g$ one can find one can find $\delta^x_0>0$ such that for every $y\in S$ that also satisfies $|x-y|<\delta^x_0$ we will have that $|g(x)-g(y)|<\varepsilon$. This implies that $x\notin B_{\delta^x_0}$ and by monotonicity we can conclude that $x\notin B_\delta$ for every $\delta <\delta^x_0$, and in particular $\cap_\delta B_\delta = \emptyset$. The $\delta$ are uncountable so we will show the probabilities go to zero for a sequence of decreasing $\delta$ in order to use our probability measure's properties and then we will extend the result.

By continuity of our probability measure, we will have that $Pr(X\in B_{\frac{1}{n}}) \to Pr(X\in \cap_n B_{\frac{1}{n}}) = Pr(X\in \emptyset) = 0$ (note that $\cap_n B_{\frac{1}{n}}$ still is empty). Then, one has that if $\eta>0$ there will exist $n_0\in \mathbb{N}$ such that $Pr(X\in B_{\frac{1}{n_0}})<\eta$. If $\delta < \frac{1}{n_0}$ then by monotonicity we will have that $Pr(X\in B_\delta)\leq Pr(X\in B_{\frac{1}{n_0}})<\eta$ and this proves that $Pr(X\in B_\delta)\to 0$ as $\delta\to 0$.

The alternative approach

As for the alternative method provided in the post, it will ultimately be false, since $Pr(X\in B_{\delta_\epsilon})$ can be positive still. The devil is in the details, and in this case closer attention should be given to how close $X_n$ and $X$ will be. It is not true that $|X_n - X|<\delta_\epsilon$ for every $\omega \in \Omega$, but rather the set in which this property does not hold will have a small probability, which one can guarantee is as small as one wishes (but still not quite always zero). We can actually find examples where the claim is not true, and the following one is one that I came up with:

Consider for example $X_n(\omega) = \omega$, $X(\omega)=\omega$ in $((0,1), \beta(0,1), \mathcal{L})$ where $\mathcal{L}$ is the Lebesgue measure. Then, we will have that $X_n \to X$ (uniformly and then in particular) in measure, and the (continuous) function $g:(0,1)\to (0,1)$ given by $g(t)=\sin(\frac{\pi}{t})$ (here is a link to its graph). For a fixed $1>\varepsilon>0$, $B_\delta$ will not be empty, in particular it can be shown that $(0, \delta)\subseteq B_\delta$ (at the end). Since $(0,\delta)\subseteq \{X\in(0,\delta)\}\subseteq\{X\in B_\delta\}$ and this set has a positive probability while $\delta$ and $\varepsilon$ are (mostly, in the case of $\varepsilon$) arbitrary we can see the claim was not true.

To prove that $(0,\delta)\subseteq B_\delta$ when $0<\varepsilon<1$, one can proceed as follows: first choose $x\in (0,\delta)$ and $t_k=\frac{2}{2\cdot k+1}, s_k=\frac{2}{2\cdot k+3}$ for $k$ big enough such that $0< t_k, s_k<\delta$. These will satisfy $g(t_k)=1$ and $g(s_k)=-1$, and then in particular we will have that distance greater than $1$ (and then $\varepsilon$) to $g(x)$, since $|g(t_k)-g(x)|+|g(x)-g(s_k)| = (1-g(x)) + (g(x)-(-1)) = 2$.

If one were to try with $h(t)=\frac{1}{t}\sin(\frac{\pi}{t})$ I think one can also show it for (truly) arbitrary $\varepsilon$ by finding a suitable interval inside $B_\delta$ for every $\varepsilon$ in a similar fashion.

I believe your definiton of $B_\delta$ is incorrect. It should be $>\epsilon$. — chuck, Commented Oct 3, 2022 at 0:37
Yes @chuck, thank you! I have now edited the answer so that the correct definition is displayed. — Nicolas Agote, Commented Oct 3, 2022 at 0:53
I am getting this now. So $\delta_0^x$ is not good enough because $P(|x-y|<\delta_0^x)>0$? Thus, we need to use $\delta_0^x$ to contruct a sequence of $B_\delta$? — chuck, Commented Oct 3, 2022 at 10:37
@chuck yes, $\delta_0^x$ is not good enough because $\mathbb{P}(X\in B_{\delta_0^x})$ can be positive (what you just know is that that particular $x$ is not a member of $B_{\delta_0^x}$. What can be done, however, is showing that the family's intersection is the empty set and from thar that the family's probabilities go to zero. To do this, since measures have good properties for countable set operations like intersections, we use a sequence as a stepping stone before proving the desired result. — Nicolas Agote, Commented Oct 3, 2022 at 10:44
Thanks. One final question. When putting everything together, which would you write $\lim_{n\rightarrow \infty}Pr(|g(X_n)-g(X)|>\epsilon)\leq \lim_{n\rightarrow \infty} \Big(Pr(|X_n-X|\geq \frac{1}{n})+Pr(X\in B_{1/n})+Pr(X\in D_g)\Big)$ or $\lim_{n\rightarrow \infty}Pr(|g(X_n)-g(X)|>\epsilon)\leq \lim_{\delta\rightarrow 0}\lim_{n\rightarrow \infty} \Big(Pr(|X_n-X|\geq \delta)+Pr(X\in B_{\delta})+Pr(X\in D_g)\Big)$? — chuck, Commented Oct 3, 2022 at 14:49

Stack Exchange Network

Proof of continuous mapping theorem from Wiki

1 Answer 1

A way to understand the proof's assertion

The alternative approach

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
statistics
continuity
metric-spaces
random-variables
.

Hot Network Questions

Proof of continuous mapping theorem from Wiki

1 Answer 1

A way to understand the proof's assertion

The alternative approach

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged statisticscontinuitymetric-spacesrandom-variables.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
statistics
continuity
metric-spaces
random-variables
.