22
$\begingroup$

Does the normal distribution converge to a certain distribution if the standard deviation grows without bounds? it appears to me that the pdf starts looking like a uniform distribution with bounds given by $[-2 \sigma, 2 \sigma]$. Is this true?

$\endgroup$
2
  • 2
    $\begingroup$ No, but in order to answer your question properly, we need to know what is your definition of convergence. Keep in mind that formal discussion is only possible when the right hand side is not changing. So, you can't establish convergence to Unifrom[$-\sigma,\sigma$] because your $\sigma$ is changing. Look up the formulation of CLT to see what I mean $\endgroup$
    – Aksakal
    Commented Jul 10, 2018 at 19:50
  • 1
    $\begingroup$ only if you wrap or truncate it to something $o(\sigma)$. $\endgroup$ Commented Jul 12, 2018 at 0:13

6 Answers 6

6
$\begingroup$

The other answers already here do a great job of explaining why Gaussian RVs don't converge to anything as the variance increases without bound, but I want to point out a seemingly-uniform property that such a collection of Gaussians does satisfy that I think might be enough for someone to guess that they are becoming uniform, but that turns out to not be strong enough to conclude that. $\newcommand{\len}{\text{len}}$

Consider a collection of random variables $\{X_1,X_2,\dots\}$ where $X_n \sim \mathcal N(0, n^2)$. Let $A = [a_1,a_2]$ be a fixed interval of finite length, and for some $c \in \mathbb R$ define $B = A +c$, i.e. $B$ is $A$ but just shifted over by $c$. For an interval $I = [i_1,i_2]$ define $\len (I) = i_2-i_1$ to be the length of $I$, and note that $\len(A) = \len(B)$.

I'll now prove the following result:

Result: $\vert P(X_n \in A) - P(x_n\in B)\vert \to 0$ as $n \to \infty$.

I call this uniform-like because it says that the distribution of $X_n$ increasingly has two fixed intervals of equal length having equal probability, no matter how far apart they may be. That's definitely a very uniform feature, but as we'll see this doesn't say anything about the actual distribution of the $X_n$ converging to a uniform one.

Pf: note that $X_n = n X_1$ where $X_1 \sim \mathcal N(0, 1)$ so $$ P(X_n \in A) = P(a_1 \leq n X_1 \leq a_2) = P\left(\frac{a_1}{n} \leq X_1 \leq \frac{a_2}n\right) $$ $$ = \frac{1}{\sqrt{2\pi}}\int_{a_1/n}^{a_2/n} e^{-x^2/2}\,\text dx. $$ I can use the (very rough) bound that $e^{-x^2/2} \leq 1$ to get $$ \frac{1}{\sqrt{2\pi}}\int_{a_1/n}^{a_2/n} e^{-x^2/2}\,\text dx \leq \frac{1}{\sqrt{2\pi}}\int_{a_1/n}^{a_2/n} 1\,\text dx $$ $$ = \frac{\text{len}(A)}{n\sqrt{2\pi}}. $$

I can do the same thing for $B$ to get $$ P(X_n \in B) \leq \frac{\text{len}(B)}{n\sqrt{2\pi}}. $$

Putting these together I have $$ \left\vert P(X_n \in A) - P(X_n \in B)\right\vert \leq \frac{\sqrt 2 \text{len}(A) }{n\sqrt{\pi}} \to 0 $$ as $n\to\infty$ (I'm using the triangle inequality here).

$\square$

How is this different from $X_n$ converging on a uniform distribution? I just proved that the probabilities given to any two fixed intervals of the same finite length get closer and closer, and intuitively that makes sense that as the densities are "flattening out" from $A$ and $B$'s perspectives.

But in order for $X_n$ to converge on a uniform distribution, I'd need $P(X_n \in I)$ to head towards being proportional to $\text{len}(I)$ for any interval $I$, and that is a very different thing because this needs to apply to any $I$, not just one fixed in advance (and as mentioned elsewhere, this is also not even possible for a distribution with unbounded support).

$\endgroup$
1
  • 1
    $\begingroup$ Right, you could almost say they converged in distribution, except that the limit of what they converge to is an improper distribution. One type of convergence that would be well defined is I would guess you could show the Wasserstein metric will approach zero as $\sigma \rightarrow \infty$? $\endgroup$
    – Cliff AB
    Commented Jul 11, 2018 at 18:25
38
$\begingroup$

A common mistake in probability is to think that a distribution is uniform because it looks visually flat when all its values are near zero. This is because we tend to see that $f(x)=0.001 \approx 0.000001=f(y)$ and yet $f(x)/f(y)=0.001/0.000001=1000$, i.e. a small interval around $x$ is 1000 times more likely than a small interval around $y$.

It's definitely not uniform on the entire real line in the limit, as there is no uniform distribution on $(-\infty,\infty)$. It's also not even approximately uniform on $[-2\sigma,2\sigma]$.

You can see the latter from the 68-95-99.7 rule you seem to be familiar with. If it were approximately uniform on $[-2\sigma,2\sigma]$, then the probability of being in $[0,\sigma]$ and $[\sigma,2\sigma]$ should be the same, as the two intervals are the same length. But this is not the case: $P([0,\sigma])\approx 0.68/2= 0.34$, yet $P([\sigma,2\sigma])\approx (0.95-0.68)/2 = 0.135$.

When viewed over the entire real line, this sequence of normal distributions doesn't converge to any probability distribution. There are a few ways to see this. As an example, the cdf of a normal with standard deviation $\sigma$ is $F_\sigma(x) = (1/2)(1+\mbox{erf}(x/\sqrt{2}\sigma)$, and $\lim_{\sigma\rightarrow\infty} F_\sigma(x) = 1/2$ for all $x$, which is not the cdf of any random variable. In fact, it's not a cdf at all.

The reason for this non-convergence boils down to "mass loss" is the limit. The limiting function of the normal distribution has actually "lost" probability (i.e. it has escaped to infinity). This is related to the concept of tightness of measures, which gives necessary conditions for a sequence of random variables to converge to another random variable.

$\endgroup$
1
  • 1
    $\begingroup$ The incorrect "it's" was "all it's values are near zero". The "It's" in "It's a common mistake" was correct. $\endgroup$ Commented Jul 11, 2018 at 16:09
17
$\begingroup$

Your statement the pdf starts looking like a uniform distribution with bounds given by $[−2σ,2σ]$ is not correct if you adjust $\sigma$ to match the wider standard deviation.

Consider this chart of two normal densities centred on zero. The red curve corresponds to a standard deviation of $1$ and the blue curve to a standard deviation of $10$, and it is indeed the case that the blue curve is almost flat on $[-2,2]$

enter image description here

but for the blue curve with $\sigma=10$, we should actually be looking at its shape on $[-20,20]$. Rescaling both the $x$-axis and $y$-axis by factors of $10$ gives this next plot, and you get exactly the same shape for the blue density in this later plot as the red density in the earlier plot

enter image description here

$\endgroup$
4
$\begingroup$

The limit of normal distributions leads to another nice property that reflects a uniform distribution, which is that conditional probabilities for any two bounded sets converge in the limit to the conditional probability that applies for the uniform distribution. I will show this below.


To facilitate our analysis, we will let $Z \sim \text{N}(0,1)$ be a standard normal random variable and we will examine the behaviour of the random variable $X=\sigma Z$ in the limit where $\sigma \rightarrow \infty$. Let $\mathcal{B}$ be a (measureable) bounded set and denote the density bounds over the set as:

$$L_\mathcal{B}(\sigma) \equiv \min_{x \in \mathcal{B}} \ \text{N}(x|0,\sigma^2) \quad \quad \quad U_\mathcal{B}(\sigma) \equiv \max_{x \in \mathcal{B}} \ \text{N}(x|0,\sigma^2).$$

Since $\mathcal{B}$ is bounded, it can be shown that:

$$\lim_{\sigma \rightarrow \infty} \frac{U_\mathcal{B}(\sigma)}{L_\mathcal{B}(\sigma)} = 0.$$

Now, let $\mathcal{A}$ and $\mathcal{B}$ be arbitrary (measureable) bounded intervals. We can write the conditional probability of the first given the second as:

$$\begin{align} \mathbb{P}(X \in \mathcal{A} | X \in \mathcal {B}) &= \frac{\mathbb{P}(X \in \mathcal {A} \cap \mathcal{B})}{\mathbb{P}(X \in \mathcal {B})} \\[6pt] &= \frac{\int_\mathcal{A \cap \mathcal{B}} \text{N}(x|0,\sigma^2) \ dx}{\int_\mathcal{B} \text{N}(x|0,\sigma^2) \ dx} \\[6pt] &= \frac{\int_\mathcal{A \cap \mathcal{B}} \text{N}(x|0,\sigma^2) \ dx}{\int_\mathcal{A \cap \mathcal{B}} \text{N}(x|0,\sigma^2) \ dx + \int_\mathcal{B-A} \text{N}(x|0,\sigma^2) \ dx} \\[6pt] \end{align}$$

Let $\lambda_{\ \cdot}$ denote the Lebesgue measure (of a set $\cdot$ shown as a subscript). By applying the bounds to the densities in the above integrals we obtain the conditional probability bounds:

$$\frac{\lambda_\mathcal{A \cap \mathcal{B}} \cdot U_\mathcal{B-A}(\sigma)}{\lambda_\mathcal{A \cap \mathcal{B}} \cdot U_\mathcal{B-A}(\sigma) + \lambda_\mathcal{B-A} \cdot L_\mathcal{B-A}(\sigma)} \leqslant \mathbb{P}(X \in \mathcal{A} | X \in \mathcal {B}) \leqslant \frac{\lambda_\mathcal{A \cap \mathcal{B}} \cdot L_\mathcal{B-A}(\sigma)}{\lambda_\mathcal{A \cap \mathcal{B}} \cdot L_\mathcal{B-A}(\sigma) + \lambda_\mathcal{B-A} \cdot U_\mathcal{B-A}(\sigma)}.$$

Taking the limit $\sigma \rightarrow \infty$ and applying the squeeze theorem then gives:

$$\lim_{\sigma \rightarrow \infty} \mathbb{P}(X \in \mathcal{A} | X \in \mathcal {B}) = \frac{\lambda_\mathcal{A \cap \mathcal{B}}}{\lambda_\mathcal{B}},$$

which is the standard conditional probability result for a uniform distribution. This confirms that when we take the limit of a sequence of normal distributions with variance approaching infinity, the conditional probability of any bounded set given any other bounded set is just as for a uniform random variable taken over any set encompassing the union of those sets.

$\endgroup$
2
  • 1
    $\begingroup$ This conclusion is true when the standard Normal distribution is replaced by any distribution supported on a neighborhood $\mathcal{U}$ of $0$ where, either on $\mathcal{U}\cap[0,\infty)$ or $\mathcal{U}\cap(-\infty,0]$ (or both), the distribution has a nonzero everywhere differentiable density with bounded derivative. The point is that as $\sigma$ grows, locally the density becomes constant. $\endgroup$
    – whuber
    Commented May 4, 2021 at 12:32
  • $\begingroup$ Indeed. The answer was certainly not certainly not intended to imply that this is a property restricted to the normal distribution. $\endgroup$
    – Ben
    Commented May 4, 2021 at 19:35
3
$\begingroup$

Your question is fundamentally flawed. The standard normal distribution is scaled so that $\sigma = 1$. So for some other Gaussian distribution ($\mu = 0, \sigma = \sigma^*$) then the curve between bounds $[-2\sigma^*, 2\sigma^*]$ has the same shape as the standard normal distribution. The only difference is the scaling factor. So if you rescale the Gaussian by dividing by $\sigma^*$, then you end up with the standard normal distribution.

Now if you have a Gaussian distribution ($\mu = 0, \sigma = \sigma^*$) then yes as $\sigma^* \rightarrow \infty$, the region between $[-2, 2]$ becomes increasing flatter.

$\endgroup$
3
$\begingroup$

Here is an alternative view of the problem that shows that of $X_n\sim N(\mu;n\sigma)$, where $\mu$ and $\sigma>0$ are fixed and $\{x\}=x-\lfloor x\rfloor$ is the fractional part function, then $\{X_n\}$ converges weakly to a random variable $U$ uniformly distributed over $[0,1]$, in other words, $$X_n\mod 1\stackrel{n}{\Longrightarrow}U$$

The function $\{x\}$ is measurable of period $1$ and so, for any $f\in\mathcal{C}_b(\mathbb{R})$, $f_\sigma(x):=f(\{\sigma x\})$ is measurable bounded and $\frac{1}{\sigma}$-periodic. Let $\phi(x)$ be the density function of the standard normal distribution. Then, by Fejér's formula $$\begin{align} E[f(\{\sigma n X+\mu\})]&=\int f(\{n\sigma x+\mu\})\phi(x)\,dx=\int f_\sigma\big(nx+\tfrac{\mu}{\sigma}\big)\phi(x)\,dx\\ &\xrightarrow{n\rightarrow\infty}\Big(\frac{1}{1/\sigma}\int^{1/\sigma}_0f_\sigma(x)\,dx\Big)\int\phi(x)\,dx=\sigma\int^{1/\sigma}_0f(\{\sigma x\})\,dx\\ &=\int^1_0f(x)\,dx=E[f(U)] \end{align}$$


Edit: After a second inspection and a comment of @whuber, it seems that view the I suggested in my answer is actually holds for any random variable $X$ whose law admits a density with respect Lebesgue measure, that is, if the law if $X$ $P_X(dx)= \phi(x)\,dx$ where $\phi\in L_1(\mathbb{R})$, then for any $\mu\in\mathbb{R}$ $$\{\sigma n X+\mu\}\stackrel{n\rightarrow\infty}{\Longrightarrow}U(0,1)$$ The same argument used above for $X_n\sim N(\mu;\sigma n)$ works for $\sigma n X+\mu$. So indeed, in this instance, other than scale and location invariance, there is nothing special about normality in the context of $\mod 1$.

One last observation. When variance is increased by the transformation $\sigma X$, where $X$ is a random variable that has density $\phi$ (with respect to Lebesgue's measure $\lambda$) such that $\phi$ is continuous at $0$, and $\phi(0)>0$, then the density of $\sigma X$ flattens out locally as $\sigma\rightarrow\infty$ giving the appearance of convergence to uniform distribution. To be more precise, suppose $A$ is a Borel set with $0<\lambda(A)<\infty$ and $P(X\in A)>0$. Consider the conditional distribution $$ P^A_\sigma(dx):=P[\sigma X\in dx|\sigma X\in A]$$ Then, for any $f\in\mathcal{C}_b(\mathbb{R})$ $$\begin{align} E[f(\sigma X)|\sigma X\in A]&=\frac{\int \mathbb{1}_{A}(\sigma x) f(\sigma x)\phi(x)\,dx}{\int\mathbb{1}_A(\sigma x)\phi(x)\,dx}\\ &=\frac{\int \mathbb{1}_{A}(x) f(x)\phi\big(\tfrac{x}{\sigma}\big)\,dx}{\int\mathbb{1}_A(x)\phi\big(\tfrac{x}{\sigma}\big)\,dx}\xrightarrow{\sigma\rightarrow\infty}\frac{1}{\lambda(A)}\int_A\,f(x)\,dx \end{align}$$ by dominated convergence. Therefore, $P^A_\sigma\stackrel{\sigma\rightarrow\infty}{\Longrightarrow}\frac{1}{\lambda(A)}\mathbb{1}_A(x)\,dx$, that is, $P^A_\sigma$ converges weakly to the uniform distribution over the set $A$. This in particular, holds for $X\sim N(0;1)$.

$\endgroup$
2
  • $\begingroup$ There is much less to this result than meets the eye: I believe it holds for all continuous distributions. $\endgroup$
    – whuber
    Commented Jun 4, 2021 at 0:52
  • 1
    $\begingroup$ @whuber: Thanks for pointed that out. I just realize that indeed, for $\mod 1$, there is nothing special about normality as variance grows. $\endgroup$ Commented Jun 4, 2021 at 17:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.