10
$\begingroup$

Let's say you have infinitely many i.i.d. Bernouilli variables $X_1, X_2, \cdots$ of parameter $p=\frac{1}{2}$. For instance, the binary digits of a random number. Let $S_n = X_1 + \cdots X_n$.

The law of the iterated logarithm (see here) states that $$\lim_{n\rightarrow\infty}\sup \frac{S_n-np}{\sqrt{2np(1-p)\log\log n}}=1.$$

I can't replicate this behavior by simulations no matter what I do. Wikipedia shows a chart explaining why the $\log \log n$ is needed in the formula (see here) but not sure what data it is based on: it goes all the way to $n=\infty$, it can't be real data if it contains an infinite number of observations. In all my tests, the $\log \log n$ makes the $\lim \sup$ tends to 0, not 1.

Now let $$Z_n=\sqrt{n}\cdot\Big(\frac{S_n}{n} - p\Big) = \frac{S_n -np}{\sqrt{n}}.$$

Here $p=\frac{1}{2}$. Of course $Z_n$ is asymptotically normal $N(0,pq)$. So technically $Z_n$ could be arbitrarily high, extremely high, and this happening regularly, suggesting that $\lim \sup Z_n =\infty$. This is in agreement with the law of the iterated logarithm. Yet based on simulations with $n=10^{10}$ and a very good random generator, all evidence suggests that $\lim\sup Z_n$ is bounded by a rather small constant, less than 1. This contradicts the law of the iterated logarithm, it seems.

Below is the chart for $Z_n$ based on 10 billion observations.

enter image description here

It was produced with the following random generator: $Y_{k+1}=3Y_k - \lfloor 3Y_k\rfloor$ with $Y_1=\sqrt{2}$ and $X_k = \lfloor 2Y_k\rfloor \in \{0, 1\}$. This random generator is aperiodic and has other nice features. In this case $p=\frac{1}{2}$.

My question

What am I missing here? Why do my results (and above is one of many tests) seem to contradict the law of the iterated logarithm? Is it the fact that the successive $Z_n$ are highly correlated? But this is assumed too in the law of the iterated algorithm, right? Or am not properly understanding the law of the iterated algorithm or the concept of $\lim \sup$?

Notes about the above random generator

The sequence $Y_n$ is completely wrong after 45 iterations or so due to error (caused by machine precision) propagating exponentially fast from one iteration to the next one. However this is not an issue, it's like starting with a new seed every 40 iterations or so. It is not an issue because the sequence $Y_n$ is ergodic.

Ideally I wanted to use $Y_{k+1}=2Y_k - \lfloor 2Y_k\rfloor$ but in the programming language I used (Perl) the iterates erroneously reached zero after 45 iterations due to the way computations are done by the computer. The computation of $2Y_k$ can be carried very efficiently in base 2, it's just shifting the binary digits.

The speed of computations in the actual version of the generator is significantly improved (by an order of magnitude) if you replace $3Y_k$ by $Y_k + Y_k + Y_k$ in the source code.

$\endgroup$
6
  • 1
    $\begingroup$ See the figures in that Wikipedia article at en.wikipedia.org/wiki/Law_of_the_iterated_logarithm#/media/…, especially the one in the lower left corner. And note that ten billion is still tiny when measured in terms of $\log\log(n)$ = $\log(\log(10^{10}))$ $\approx 3.1.$ You need this expression to get large. So, if you wish to continue your simulation, consider going out to $n$ for which $\log\log(n)$ is, say, $10^3$ :-). $\endgroup$
    – whuber
    Commented Mar 27, 2020 at 21:35
  • $\begingroup$ Yes, thanks @Whuber, I saw the picture and spent quite some time on it. It is convincing (that I am wrong) yet it is based on only 1 million observations. Not sure what data was used, but in my case with 10 billion observations, I see the same behavior as in the lower left chart, but without the $\log \log n$ term. In short, my chart (equivalent to the upper right chart on Wikipedia) looks like the one in the lower left. $\endgroup$ Commented Mar 27, 2020 at 21:45
  • $\begingroup$ The Wikipedia charts show maybe 50 trajectories or so, and sure most of them appear bounded even in the upper right corner if you look at only 1 million points. But in my case I show only one trajectory but it has 10,000 times more points yet bounded. Maybe it would be different with $n=10^{30}$ or there is some other mechanism at play here. $\endgroup$ Commented Mar 27, 2020 at 21:52
  • 1
    $\begingroup$ You are correct: a longer simulation will do the trick. $\endgroup$
    – whuber
    Commented Mar 27, 2020 at 22:07
  • 2
    $\begingroup$ More than trillions! The lifetime of a multiverse wouldn't be enough. However, it's not necessary explicitly to generate every value in the walk. It suffices to find enough values to analyze and graph it.. $\endgroup$
    – whuber
    Commented Mar 28, 2020 at 13:21

1 Answer 1

9
$\begingroup$

You are looking at a truly tiny simulation. Here's one that went out to $n=2^{1000} \approx 1.07\times 10^{301}:$

Figure

(In order to plot it I thinned the walk to $999$ values equally spaced horizontally and connected them with line segments. The actual simulation is, of course, much more detailed than can be shown here :-).)

Clearly this walk repeatedly hits (and slightly exceeds) the $\pm 1$ thresholds. The Law of the Iterated Logarithm says this behavior will continue ad infinitum, with the excursions beyond this threshold growing ever rarer.

The range from $n=10^8$ to $n=10^{10},$ which is basically that visible in your plot, is bounded on the left and right by the light blue lines. In the overall context, we can hardly expect the scaled random walk to vary much within such a narrow interval.

The point is that this law requires you to plot the scaled, standardized random walk on a logarithmic (or even log-log) axis for the index $n.$

$\endgroup$
3
  • 2
    $\begingroup$ Thank you. Oh well, I will have to change quite a few things in two important papers that I have written, but glad to know the truth now (and most importantly, know why it works that way). It will actually make the papers more exciting after the correction. How did you produced these $2^{1000}$ deviates? Just curious. Some of the first random generators I used were faulty beyond $10^{8}$. $\endgroup$ Commented Mar 27, 2020 at 22:23
  • 2
    $\begingroup$ I used an approximation that is far more accurate than the numerical imprecision in the double-precision arithmetic of the calculation: namely, that the increments in this random walk (after the first few) follow Normal distributions. One can therefore implement a simulation from the top down by generating an initial segment with Binomial increments and then obtaining the thinned values at larger times by accumulating Normal variates. To inspect the detail, construct Brownian bridges between those values. This is a kind of "just-in-time" simulation. $\endgroup$
    – whuber
    Commented Mar 28, 2020 at 13:14
  • $\begingroup$ I added more details about my random number generator, see updated post, with new paragraph at the bottom. $\endgroup$ Commented Mar 29, 2020 at 3:35