64
$\begingroup$

Does this reflect the real world and what is the empirical evidence behind this?

Wikipedia illustration

Layman here so please avoid abstract math in your response.

The Law of Large Numbers states that the average of the results from multiple trials will tend to converge to its expected value (e.g. 0.5 in a coin toss experiment) as the sample size increases. The way I understand it, while the first 10 coin tosses may result in an average closer to 0 or 1 rather than 0.5, after 1000 tosses a statistician would expect the average to be very close to 0.5 and definitely 0.5 with an infinite number of trials.

Given that a coin has no memory and each coin toss is independent, what physical laws would determine that the average of all trials will eventually reach 0.5. More specifically, why does a statistician believe that a random event with 2 possible outcomes will have a close to equal amount of both outcomes over say 10,000 trials? What prevents the coin to fall 9900 times on heads instead of 5200?

Finally, since gambling and insurance institutions rely on such expectations, are there any experiments that have conclusively shown the validity of the LLN in the real world?

EDIT: I do differentiate between the LLN and the Gambler's fallacy. My question is NOT if or why any specific outcome or series of outcomes become more likely with more trials--that's obviously false--but why the mean of all outcomes tends toward the expected value?

FURTHER EDIT: LLN seems to rely on two assumptions in order to work:

  1. The universe is indifferent towards the result of any one trial, because each outcome is equally likely
  2. The universe is NOT indifferent towards any one particular outcome coming up too frequently and dominating the rest.

Obviously, we as humans would label 50/50 or a similar distribution of a coin toss experiment "random", but if heads or tails turns out to be say 60-70% after thousands of trials, we would suspect there is something wrong with the coin and it isn't fair. Thus, if the universe is truly indifferent towards the average of large samples, there is no way we can have true randomness and consistent predictions--there will always be a suspicion of bias unless the total distribution is not somehow kept in check by something that preserves the relative frequencies.

Why is the universe NOT indifferent towards big samples of coin tosses? What is the objective reason for this phenomenon?

NOTE: A good explanation would not be circular: justifying probability with probabilistic assumptions (e.g. "it's just more likely"). Please check your answers, as most of them fall into this trap.

$\endgroup$
25
  • 50
    $\begingroup$ Empirically and Proven are kind of opposite terms in a sense, aren't they? $\endgroup$ Commented Jan 29, 2015 at 15:44
  • 27
    $\begingroup$ @user1891836 Please note that the law of large numbers does not suggest that "the amounts of heads and tails will eventually even out": with 2000 coin tosses, it is very unlikely to see tails 1000 times more than heads, but with 1,000,000 coin tosses it could very well happen. The law of large numbers only says that the deviation grows slower than the number of coin tosses, thus the proportions of heads and tails, not the amounts, will even out in the long run. $\endgroup$
    – JiK
    Commented Jan 29, 2015 at 16:08
  • 6
    $\begingroup$ There's no tendency for it to "even out" in the sense that if you saw more heads at the beginning, you'd see more tails at the end. Suppose you flipped a fair coin 10 times and it came up heads 10 times. If you flipped 100 more times, then the expected number of heads altogether would be 60, not 55. The coin doesn't secretly know that it came up heads too many times and fixes it up by coming up tails more later. $\endgroup$
    – arsmath
    Commented Jan 29, 2015 at 16:19
  • 24
    $\begingroup$ Finally, since gambling and insurance institutions rely on such expectations, are there any experiments that have conclusively shown the validity of the LLN in the real world? - Isn't the fact that insurances are really solid businesses with billions of dollars of revenues one of the best empirical proof you can imagine? I mean, try do conduct a study with hundreds of millions of people for 100 years in a laboratory :P $\endgroup$
    – Ant
    Commented Jan 29, 2015 at 16:44
  • 5
    $\begingroup$ If a visualisation would help, think of a Galton Board: i.ytimg.com/vi/oPCcOtQKU8M/hqdefault.jpg Each path through the board has an equal probability. But a lot more of the paths end in the centre than at the edge! $\endgroup$ Commented Jan 30, 2015 at 15:06

17 Answers 17

70
$\begingroup$

Reading between the lines, it sounds like you are committing the fallacy of the layman interpretation of the "law of averages": that if a coin comes up heads 10 times in a row, then it needs to come up tails more often from then on, in order to balance out that initial asymmetry.

The real point is that no divine presence needs to take corrective action in order for the average to stabilize. The simple reason is attenuation: once you've tossed the coin another 1000 times, the effect of those initial 10 heads has been diluted to mean almost nothing. What used to look like 100% heads is now a small blip only strong enough to move the needle from 50% to 51%.

Now combine this observation with the easily verified fact that 9900 out of 10000 heads is simply a less common combination than 5000 out of 10000. The reason for that is combinatorial: there is simply less freedom in hitting an extreme target than a moderate one.

To take a tractable example, suppose I ask you to flip a coin 4 times and get 4 heads. If you've flip tails even once, you've failed. But if instead I ask you to aim for 2 heads, you still have options (albeit slimmer) no matter how the first two flips turn out. Numerically we can see that 2 out of 4 can be achieved in 6 ways: HHTT, HTHT, HTTH, THHT, THTH, TTHH. But the 4 out of 4 goal can be achieved in only one way: HHHH. If you work out the numbers for 9900 out of 10000 versus 5000 out of 10000 (or any specific number in that neighbourhood), that disparity becomes truly immense.

To summarize: it takes no conscious effort to get an empirical average to tend towards its expected value. In fact it would be fair to think in the exact opposite terms: the effect that requires conscious effort is forcing the empirical average to stray from its expectation.

$\endgroup$
17
  • 6
    $\begingroup$ @user1891836 You don't need physics for this result, just combinatorics. In this context "freedom in hitting a target" is just a count of how many ways you can possibly succeed in hitting that target. $\endgroup$
    – Keen
    Commented Jan 29, 2015 at 16:39
  • 3
    $\begingroup$ Essentially, because for very large numbers of coin flips, the probability of seeing anything far away from the mean is very, very small. $\endgroup$
    – arsmath
    Commented Jan 29, 2015 at 17:06
  • 4
    $\begingroup$ @user1891836 The scenario you described includes the assumption that each coin-flip has an equal probability of producing the outcome heads as the outcome tails. That's for a single trial. When you look at two independent trials, you combine the two outcomes of two single trials to get a total of four possible outcomes: HH, HT, TH, TT. Each outcome has equal probability because the single-trial outcomes had equal probability and the trials are independent. Notice that half of the two-trial outcomes are balanced, even though zero of the single-trial outcomes are. Then imagine more trials. $\endgroup$
    – Keen
    Commented Jan 29, 2015 at 17:57
  • 3
    $\begingroup$ @user1891836 There's no effort involved. There is no such thing as harder or easier. The single basic fact is that each coin-flip is fair and independent, because you chose to ask about fair, independent coin-flips. If there are 1000 possible outcomes, and they're all equally probable, then every outcome has a 0.1% probability. If you group together 97 outcomes, the probability of getting an outcome in that group is 9.7%, the sum of the individual probabilities. Now suppose that 500 out of 1000 outcomes are balanced. What is the probability of a selected outcome being in this balanced group? $\endgroup$
    – Keen
    Commented Jan 30, 2015 at 19:52
  • 3
    $\begingroup$ @user1891836 The universe doesn't have to care. Let's try another way: Imagine a bag full of red and blue balls: A blue ball represents one way how you could get between 4 and 6 tails when tossing 10 coins, a red ball represents one way how you could get between 8 and 10 tails. In this rather large bag there will be 728 balls, but if you randomly grab one ball without looking your chances to get a blue ball are much bigger since there are 672 blue balls but only 56 red ones. $\endgroup$
    – Voo
    Commented Jan 31, 2015 at 15:31
20
$\begingroup$

Nice question! In the real word, we don't get to let $n \to \infty$, so the question of why LLN should be of any comfort is important.

The short answer to your question is that we cannot empirically verify LLN since we can never perform an infinite number of experiments. Its a theoretical idea that is very well founded, but, like all applied mathematics, the question of whether or not a particular model or theory holds is a perennial concern.

A more useful law from a statistical standpoint is the Central Limit Theorem and the various probability inequalities (Chebyshev, Markov, Chernov, etc). These allow us to place bounds on or approximate the probability of our sample average being far from the true value for a finite sample.

As for an actual experiment to test LLN. One can hardly do better than John Kerrichs 10,000 coin flip experiment-- he got 50.67% heads!!

So, in general, I would say LLN is empirically well supported by the fact that scientists from all fields rely upon sample averages to estimate models, and this approach has been largely successful, so the sample averages appear to be converging nicely for finite, and feasible, sample sizes.

There are "pathological" cases that one can construct (I'll spare you the details) where one needs astronomical sample sizes to get a reasonable probability of being close to the true mean. This is apparent if you are using the Central Limit Theorem, but the LLN is simply not informative enough to give me much comfort in day-to-day practice.

The physical basis for probability

It seems you still an issue with why long-run averages exist in the real world, apart from the theory of probability regarding the behavior of these averages assuming long-run averages exist. Let me state a fact that may help you:

Fact Nether probability theory nor the existence of a long-run averages requires randomness !

The determinism vs. indeterminism debate is for philosophers, not mathematics. The notion of probability as a physical observable comes from ignorance or absence of the detailed dynamics of what you are observing. You could just as easily apply probability theory to a boring 'ol pendulum as to the stock market or coin flips...its just that with pendulum's we have a nice, detailed theory that that allows us make precise estimates of future observations. I have no doubt that a full physical analysis of a coin flip would allow for us to predict what face would come up...but in reality, we will never know this!

This isn't an issue though. We don't need to assume a guiding hand nor true indeterminism to apply probability theory. Lets say that coin flips are truly deterministic, then we can still apply probability theory meaningfully if we assume a couple basic things:

  1. The underlying process is $ergodic$...okay, this is a bit technical, but it basically means that the process dynamics are stable over the long term (e.g., we are not flipping coins in a hurricane or where tornados pop in and out of the vicinity!). Note that I said nothing about randomness...this could be a totally deterministic, albeit very complex, process...all we need is that the dynamics are stable (i.e., we could write down a series of equations with specific parameters for the coin flips and they wouldn't change from flip to flip).
  2. The values the process can take on at any time are "well behaved". Basically, like I said earlier wrt the Cauchy...the system should not produce values that consistently exceed $\approx n$ times the sum of all previous observations. It may happen once in a while, but it should become very rare, very fast (precise definition is somewhat technical).

With these two assumptions, we now have the physical basis for the existence of a long-run average of a physical process. Now, if its complicated, then instead of using physics to model it exactly, we can apply probability theory to describe the statistical properties of this process (i.e., aggregated over many observations).

Note that the above is independent from whether or not we have selected the correct probability model. Models are made to match reality...reality does not conform itself to our models. Therefore, it is the job of the modeler, not nature or divine provenance, to ensure that the results of the model match the observed outcomes.

Hope this helps clarify when and how probability applies to the real world.

$\endgroup$
10
  • $\begingroup$ John Kerrichs's experiment is a fascinating example. If LLN is valid enough for 10,000+ coin tosses, there is an obvious link btw n and mean value, which is in total conflict with the unpredictability of a single or small number tosses. $\endgroup$ Commented Jan 29, 2015 at 16:18
  • 2
    $\begingroup$ @user1891836 I'm sorry, I don't follow your reasoning. LLN is always mathematically valid, and yes, the observed average is quite close to that of a fair coin (assuming the coin John kerrich was using was indeed fair). There's a bit of chicken or the egg issue here...what are we assuming and what is being tested? $\endgroup$
    – user76844
    Commented Jan 29, 2015 at 16:21
  • 2
    $\begingroup$ @user1891836 you appear mystified that some physical processes exhibit stability over the long term. If you interpret a probability as a frequency of occurrence of an event, then any process that exhibits periodic stability can be assigned meaningful probability statements. There is a more technical notion of "Ergodic" processes that extends this to non-periodic processes. I won't get into it, but I think taking a look at Chaos theory will help show you why we can use probability and why it works. $\endgroup$
    – user76844
    Commented Jan 29, 2015 at 16:25
  • 2
    $\begingroup$ @user1891836 simple example: you are observing a pendulum, then the probability that it forms an angle $\theta$ wrt the vertical is equal to the probability that it forms an angle $-\theta$. A process does not have to be unpredictable50.67 to have a frequentist probability. $\endgroup$
    – user76844
    Commented Jan 29, 2015 at 16:27
  • 1
    $\begingroup$ I see what you are getting at. I guess I have a hard time swallowing that the mean is empirically close to expected value. :) $\endgroup$ Commented Jan 29, 2015 at 16:39
16
$\begingroup$

This isn't an answer, but I thought this group would appreciate it. Just to show that the behavior in the graph above is not universal, I plotted the sequence of sample averages for a Standard Cauchy distribution for $n=1...10^6$!. Note how, even at extremely large sample sizes, the sample average jumps around.

If my computer weren't so darn slow, I could increase this by another order of magnitude and you'd not see any difference. The sample average for a Cauchy Distribution behaves nothing like that for coin flips, so one needs to be careful about invoking LLN. The expected value of your underlying process needs to exist first!

enter image description here

Response to OP concerns

I did not bring this example up to further concern you, but merely to point out that "averaging" does not always reduce the variability of an estimate. The vast majority of the time, we are dealing with phenomena that possess an expected value (e.g., coin tosses of a fair coin). However, the Cauchy is pathological in this regard, since it does not possess an expected value...so there is no number for your sample averages to converge to.

Now, many moons ago when I first encountered this fact, it blew my mind...and shook my confidence in statistics for a short time! However, I've come to be comfortable with this fact. At the intuitive level (and as many of the posters here have pointed out) what the LLN relies upon is the fact that no single outcome can consistently dominate the sample average...sure, in the first few tosses the outcomes do have a large influence, but after you've accumulated $10^6$ tosses, you would not expect the next toss to change your sample average from, say, 0.1 to 0.9, right? It's just not mathematically possible.

Now enter the Cauchy distribution...it has the peculiar property that, no matter how many values you are currently averaging over, the absolute value of the next observation has a good (i.e., not vanishingly small - this part is somewhat technical, so maybe just accept this point) chance of being larger (much larger, in fact) than n times the sum of all previous values observed...take a moment to think about this, this means that at any moment, your sample average can be converging to some number, then WHAM, it gets shot off in a different direction. This will happen infinitely often, so you're sample average will never settle down like it does with processes that possess an expected value (e.g., coin tosses, normally distributed variables, poisson, etc.). Thus, you will never have an observed sum and an $n$ large enough to swamp the next observation.

I've asked @sonystarmap if he/she would mind calculating the sequence of medians, as opposed to the sequence of averages in their post (similar to my post above, but for 100x more samples!) What you should see is that the median of a sequence of Caychy random variables does converge in LLN fashion. This is because the Cauchy, like all random variables, does possess a median. This is one of the many reasons I like using medians in my work, where Normality is almost surely (sorry, couldn't help myself) false and there are extreme fluctuations. Not to mention the sample median minimizes the average deviation from the mean, when it does exist.

Second Addition: Cauchy DOES have a Median

To add another detail (read:wrinkle) to this story, the Cauchy does have a median, and so the sequence of medians does converge to the true median (i.e., $0$ for the standard Cauchy.) To show this, I took the exact same sequence of standard cauchy variates I used to make my first graph of the sample averates, and then took the first 20,000 and broke it up into four intervals of 5000 observations each (youll see why in a moment). I then plotted the sequence of sample medians as the samep size approaches 5000 for each of the four independent sequence. Note the dramatic difference in convergence properties!

This is another application of the law of large numbers, but to the sample median. Details can be seen here.

enter image description here

$\endgroup$
19
  • 1
    $\begingroup$ Good point. You can't very well compare your average result with $\mu$ if $\mu$ is undefined. $\endgroup$
    – KSmarts
    Commented Jan 29, 2015 at 22:32
  • $\begingroup$ What did you use to produce this graph? $\endgroup$
    – detly
    Commented Jan 30, 2015 at 1:15
  • $\begingroup$ @detly I used R to produce the Cauchy variates, then did the averages and graph itself in Excel. $\endgroup$
    – user76844
    Commented Jan 30, 2015 at 1:35
  • 1
    $\begingroup$ @Eupraxis1981 Of what use is LLN with such an example?! $\endgroup$ Commented Jan 30, 2015 at 9:48
  • 3
    $\begingroup$ @user1891836 Are you questioning (1) The general usefulness of LLN given that such an example exists, (2) how one would use the LLN in this example...if at all, or (3) the relevance of my example to your concerns? $\endgroup$
    – user76844
    Commented Jan 30, 2015 at 13:23
10
$\begingroup$

Based on your remarks, I think you are actually asking

"Do we observe the physical world behaving in a mathematically predictable way?"

"Why should it do so?"

Leading to:

"Will it continue to do so?"

See for example Philosophy stack exchange question.

My take on the answer is that, "Yes", for some reason the physical universe seems to be a machine obeying fixed laws, and this is what allows science to use mathematics to predict behaviour.

So, if the coin is unbiased and the world behaves consistently then number of heads will vary in a predictable way.

But please note that it is not expected to converge to exactly half. In fact, the excess or deficit will go as $\sqrt N$, which actually increases with $N$. It is the proportion of the excess relative to the total number of trials $N$ which goes to zero.

However, no-one can ever prove in principle whether, for example, the universe actually has a God who decides how the coin will fall. I recall that in Peter Bernstein's book about Risk the story is told that the Romans (who did not know probability as a concept) had rules for knucklebone based games that effectively assumed this.

Finally, if you ask which state of affairs is "well supported by evidence", the evidence available would include at least all of science and the finance industry. That's enough for most of us.

$\endgroup$
1
  • 1
    $\begingroup$ I agree. I might have made a mistake posting the question here, but probability theory is not a descriptive theory, yet we have a hard time disproving it in the real world. It seems, when it comes to chaotic, practically unpredictable events, the mean of all trials tends to go to expected value, even though there is no way you can predict any one or series of trials. I.e. we can reasonably predict the mean of large samples, but not any small ones. That's puzzling. $\endgroup$ Commented Jan 30, 2015 at 10:17
9
$\begingroup$

One has to distinguish between the mathematical model of coin tossing and factual coin tossing in the real world.

The mathematical model has been set up in such a way that it behaves provably according to the rules of probability theory. These rules do not come out of thin air: They encode and describe in the most economical way what we observe when we toss real coins.

The deep problem is: Why do real coins behave the way they do? I'd say this is a question for physicists. An important point is symmetry. If there is a clear cut "probability" for heads, symmetry demands that it should be ${1\over2}$. Concerning independence: There are so many physical influences determining the outcome of the next toss that the face the coin showed when we picked it up from the table seems negligible. And on, and on. This is really a matter of philosophy of physics, and I'm sure there are dozens of books dealing with exactly this question.

$\endgroup$
3
  • $\begingroup$ Would you care to elaborate on why we observe such tendency towards even amounts of the outcomes of a fair coin toss in the long run? $\endgroup$ Commented Jan 29, 2015 at 16:13
  • $\begingroup$ @user1891836 Because as suggested by others, even with its inherent imperfections, the flipping of a physical coin is (typically) a very good approximation for the idealized coin-flip that mathematics defines and shows to have equal probabilities in heads vs. tails. $\endgroup$ Commented Jan 29, 2015 at 18:06
  • $\begingroup$ If you are interested in real coins, search youtube for Persi Diaconis talks about tossing of real coins ... He is the real exopert on that! $\endgroup$ Commented May 29, 2017 at 1:07
5
$\begingroup$

One has to distinguish between the mathematical model of coin tossing and the human intuition of it.

It is worthwhile to consider the following experiment.

A teacher divides his class into two groups. Then he gives a coin to each member of the one group. Each member of this group will flip his coin, say, 100 times. Everybody will jot down the results. The members of the other group will not have coins. They will simulate the coin flipping experiment by writing down imaginary results. Then everybody puts a secret mark on his paper. Finally the papers get shuffled and the children hand over the stack to the teacher. Surprisingly the teacher will be able to tell, with quite high certainty, who flipped coins and who just imagined the experiments. How? The average length of the consecutive heads (or tails) in the real experiments is way longer than in the case of the imaginary ones.

This demonstration, among other interesting examples, illustrates that the instinctive human understanding of random phenomena is quite unreliable.

So not only is it true that probability theory has nothing to do with reality, but it does not have anything to do with human intuition either. However, falsifying the predictions of probability theory is tiresome many times. (Validation of the same is impossible, of course. Regarding this latter feature probability theory is not special.)

$\endgroup$
4
  • 1
    $\begingroup$ How would you define a "truly random" event and if it is objectively unpredictable, why should probability theory be able to predict the average of a 1,000 such events? $\endgroup$ Commented Jan 29, 2015 at 18:21
  • $\begingroup$ Please, explain why you added this comment to my "answer." $\endgroup$
    – zoli
    Commented Jan 29, 2015 at 22:07
  • 1
    $\begingroup$ You stated that humans are not very good at dealing with random phenomena, which necessitates to define what is really random. Second, even though probability theory is just a mathematical model, it is pervasively used to predict all sorts of events (from future car accidents to roulette spins) in the aggregate in order to make decisions. It's just weird the large samples can be predicted, but small can't. $\endgroup$ Commented Jan 30, 2015 at 9:40
  • $\begingroup$ I agree. My axiomatic answer to the first claim is: If $A$ is the set of things that humans cannot handle well then $Pr\in A$. $\endgroup$
    – zoli
    Commented Jan 30, 2015 at 16:54
4
$\begingroup$

It looks like most of the answers are addressing the apparent (but maybe not actual) misunderstanding behind your question. I will try to give a more direct mathematical explanation. I know you said to "avoid abstract math," so I will try to explain what I'm doing.

Suppose we have a random variable $X$. Basically, this is an abstraction of a random or unpredictable event. It has multiple possible values, each with a probability that it is the result. We calculate the expected value of $X$, or $E(X)$, by multiplying each possible result by its probability and adding them together. This is the same as the sample mean, $\mu$.

We can also determine how "spread out" the possible values are, by calculating the variance. The variance, $\sigma^2$, is the expected value of the square of the deviation from the mean, which is how far the random variable is from its expected value. That is, the deviation is $X-\mu$, and the variance is $\sigma^2=E\left((X-\mu)^2\right)$. We also have standard deviation $\sigma$, which is the square root of the variance.

Intuitively, we can say that "most" of the time, the result of a random test will be "close" to the expected value. If we know the random variable's variance, we can define "close" in terms of the variance or standard deviation and make this a mathematical statement. In particular, $$P(|X-\mu|\ge k\sigma)\le\frac{1}{k^2}$$ This is Chebyshev's Inequality, and it says that the probability that a random variable is $k$ or more standard deviations from the mean is less than or equal to $1/k^2$. While this exact result might not be obvious, but the idea should be clear: If there were more likely outcomes farther away, then the variance would be higher. From this, we can prove the (weak) Law of Large Numbers.

Let us take $n$ independent random variables $X_1,X_2,\ldots,X_n$ with the same distribution, with finite mean $\mu$ and finite variance $\sigma^2$, and define their average as $\overline{X}_n=\frac{1}{n}(X_1+\ldots+X_n)$. Then $E(\overline{X}_n)=\mu$, and $Var(\overline{X}_n)=Var(\frac1n(X_1+\ldots+X_n))=\frac{\sigma^2}{n}$

Obviously, for any positive real number $\epsilon$, $|\overline{X}_n-\mu|$ is either greater than, less than, or equal to $\epsilon$. There are no other possibilities, so \begin{equation} P(|\overline{X}_n-\mu|<\epsilon)+P(|\overline{X}_n-\mu|\ge\epsilon)=1\\ P(|\overline{X}_n-\mu|<\epsilon)=1-P(|\overline{X}_n-\mu|\ge\epsilon) \end{equation} Then, applying Chebyshev's Inequality (substituting $k=\frac{\epsilon}{\sigma}$) gives $$P(|\overline{X}_n-\mu|<\epsilon)\ge1-\frac{\sigma^2}{n\epsilon^2}$$ So as we take more trials, that is, as $n\to\infty$, this lower bound approaches $1$. And since probabilities cannot be greater than $1$, we have $$\lim_{n\to\infty}P(|\overline{X}_n-\mu|<\epsilon)=1$$ Or, equivalently, $$\lim_{n\to\infty}P(|\overline{X}_n-\mu|<\epsilon)=0$$ This is the Weak Law of Large Numbers.

It is important for me to point out, for general understanding, that having a probability of $0$ is not quite the same thing as being literally impossible. What it means is that almost all tests (in a mathematical sense) will fail. In this case, there are an uncountably infinite number of infinite sets of random variables, but there are only countably many sets whose average differs from the expected value.

$\endgroup$
2
  • $\begingroup$ +1. I think it you point out a very important aspect, namely that probability of 0 does not mean an event can't happen. $\endgroup$
    – Thomas
    Commented Jan 31, 2015 at 12:03
  • 1
    $\begingroup$ @KSmarts Minor quibble: measure $0$ isn't the same as countable. In this case here the set of outcomes whose average deviates from the expectation is still uncountably infinite. $\endgroup$
    – Erick Wong
    Commented Jan 31, 2015 at 16:35
3
$\begingroup$

The physical assumptions are that in each trial of tossing the coin, the coin is identical, and the laws of physics are identical, and the coin in no way "remembers" what it did before. With those assumptions, you can then say that there is some number between $0$ and $1$ that represents the probability of any given toss coming up heads.

Warning: that probability need not be $\frac{1}{2}$. In fact, a standard US penny will land on tails about 51% of the time.

Once you have that number, which we could call $p$, then it is meaningful to talk about the expected value of the number of heads arising in $1$ toss, which is that same $p$, and the expected value of the number of heads arising in $N$ tosses (the average result of $N$ trials) which is also $p$ because the tosses are completely independent.

Then the practical effect of LLN is to know that the likelihood of the average number of heads in an actual set of $N$ trials being "far" from $p$ becomes vanishingly small, provided that by "far" you mean more than a few times $\sqrt{1/N}$. And since for very large $N$, $\sqrt{1/N}$ becomes very small, we can say that with probability almost 1 the average of $N$ trials will lie in a small range about its in-principle value of $p$.

$\endgroup$
3
  • $\begingroup$ But why would the average of N tend toward p? Why would the universe obey? :) $\endgroup$ Commented Jan 29, 2015 at 17:03
  • 1
    $\begingroup$ Because errors get washed out. This is fundamental. The definition of probability is not abstract nor arbitrary. It is chosen so that the law of large numbers holds. $\endgroup$
    – Joshua
    Commented Jan 30, 2015 at 0:18
  • 1
    $\begingroup$ I presume you are referring to the Diaconis, Holmes & Montgomery result. Even if you follow their line of argument (which is amusing) you need to be careful about their conclusion which is that 'Any coin that is tossed vigorously and high, and caught in midair has about a 51 % chance of landing with the same face up that it started with.' It does not conclude that tails are more probable, in fact their analysis doesn't distinguish either side (other than labeling). $\endgroup$
    – copper.hat
    Commented Jan 31, 2015 at 6:44
3
$\begingroup$

I think it is very helpful to redefine the Law of Large Numbers:

Wikipedia gives it as follows:

According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

However it's important to note that the law isn't necessarily describing a physical law, rather a mathematical law. It would be better stated as:

As more trials are performed, the probability that it will deviate from the mean will get smaller and smaller.

In other words within the mathematical framework of probabilities, a larger set has a larger probability of being closer to the mathematical probabilistic mean.

What seems to be bothering the OP is why any probabilities have a bearing on the physical world. As commented above, frequentist probability is just a description of possible outcomes and their ratios - it never explains why or what physical law keeps the world in sync with such a law. The OP's question is more a physics/philosophy question, (one that has bothered me for ages). It reminds me of the Is-ought problem.

As an example, given an infinite number of universes, there will be one universe where all random events follow the most unlikely probability. The poor fellow living in such a universe would be best off always taking the worst odds. Why should we assume that we are in a universe that happens to be one which will follow the most probable outcome. (Of course one will argue that according to probabilities, we should find ourselves in the universe that is closer to the mean. I just mean this as an example to bring out the problem of there being no physical necessity for the Law of Large Numbers to be true in real life.)

This is not the same as "why do physics stay the same" - even if we accept that the speed of light is constant, and that mass creates gravity, it's a much bigger stretch to say that there is a general physical law that minds probabilities in the real world, always keeping them in sync with mathematics. The difference is that the others laws apply in any given physical situation - mass will always create gravity, etc. Whereas probability by definition allows for variation - just claiming that the mean will eventually add up. (As I argued before, it really doesn't even claim this.) (From studying quantum physics and uncertainty, it really does seem as if the universe corrects itself over large samples of purely random events to match the mean).

Edit: I've found that the problem described - the empirical/logical meaning of probabilities - has already been addressed by David Hume in An Enquiry Concerning Human Understanding, Section VI: of Probability, and at length by Henri Poincaré in Science and Hypothesis. (An additional resource, though in Hebrew, is Sha'arei Yosher 3.2.3)

$\endgroup$
6
  • $\begingroup$ This is the weak law. The Wikipedia describes the strong law. $\endgroup$
    – copper.hat
    Commented Jan 30, 2015 at 8:39
  • $\begingroup$ @afuna according to probabilities, we should find ourselves in the universe that is closer to the mean. I just mean this as an example to bring out the problem of there being no physical necessity for the Law of Large Numbers to be true in real life. This is in tune with Erick Wong's answer. If I understand correctly, it takes more effort to make things happen non-probabilistically, but since most events follow the path of least resistance, probability predictions win in the long run--as they are more easily achieved (due to combinatorics). $\endgroup$ Commented Jan 30, 2015 at 10:24
  • $\begingroup$ @copper.hat. I'm not sure why you say that. The article explains the law in general (and also has a subsection about the strong law - which applies over an infinite sample) $\endgroup$ Commented Jan 30, 2015 at 12:47
  • $\begingroup$ @user1891836: I personally don't understand why one should make any (probabilistic) assumptions about our universe - Erick Wong's answer hasn't convinced me about why the universe does what it does. If the universe is determinate, it is what it is and all the mathematical propositioning won't convince it otherwise. If it is truly random, there's no reason why it can't often turn out to favor the lesser probability - to argue that it probably will follow probability, is circular reasoning. $\endgroup$ Commented Jan 30, 2015 at 12:52
  • $\begingroup$ @afuna: You wrote that 'It would be better stated as...', I was just pointing out that this is a weaker statement that the statement than the Wiki statement that preceded it. $\endgroup$
    – copper.hat
    Commented Jan 30, 2015 at 16:39
2
$\begingroup$

There are plenty of correct answers here. Let me see if I can make the correct answer dead-simple.

The Gamblers Fallacy is the belief that a past trend in random events will tend to be balanced by an opposite trend in future random events. "If the last 10 coin flips have been heads, the next coin flip is more likely to be tails."

The Law of Large numbers is the observation that regardless of the nature or pattern of the variation, as your sample size gets larger, the significance of the variation (whether positive or negative) gets smaller. "If the last 10 coin flips have all been heads, that has a significant impact on the average of a sample of 50, but an insignificant impact on the average of a sample of 50,000"

$\endgroup$
2
  • $\begingroup$ Sure, this is the statistical explanation, but in order for the LLN to work over large samples, it requires that some objective law keeps randomness random. If you had a 70/30 distribution, you'd suspect the coin isn't fair, yet if the universe is truly indifferent towards any one or series of outcomes, there's nothing unlikely about this result. Still, when we speak of random phenomena like tosses, statisticians expect them to be close to 50/50 in the long run--in line with probability theory. This necessitates that sth keeps the total average in check, however. Else, you'd suspect a bias. $\endgroup$ Commented Jan 31, 2015 at 11:18
  • $\begingroup$ @vantage5353 ,did you find any satisfactory answer since then, I'm having the same question? $\endgroup$
    – Kashmiri
    Commented Jan 6, 2022 at 6:45
2
$\begingroup$

It seems to me that the core of your question has nothing to do with the Law of Large Numbers and everything to do with why the physical universe behaves in the ways that mathematics predicts.

You might as well ask this: Whenever I have two of something in my left hand and three of something in my right hand, I find that I have five of that something altogether. I understand that mathematics predicts this, but why should the Universe obey?

Or: Mathematics tells me that for any numbers x and y, if I have x piles of stones with y stones in each pile, and you have y piles of stones with x in each pile, then we'll each have the same number of stones. What's the empirical evidence for this law? Why should we expect the Universe to behave this way just because mathematics says it should?

I don't know what answers to these questions you'd consider satisfactory, but I think you'll gain some insight if you concentrate on these much simpler questions, where the fundamental issues are exactly the same as in the question you're asking.

$\endgroup$
2
$\begingroup$

Suppose you've tossed a fair coin ten times, and it has been heads nine times out of ten, for an observed $\frac{\mathrm{heads}}{\mathrm{flips}} = 0.9$. There is a 50% chance that the next toss will be heads, making 10/11 heads, and a 50% chance that the next toss will be tails, making 9/11 heads. The expected fraction of heads after the next toss is then $0.5 \frac{10}{11} + 0.5 \frac{9}{11} = \frac{19}{22} \approx 0.864$, which is closer to 0.5 than 0.9 is.

It's pure math. Given a fair coin with no memory, if the fraction of heads up until now is 0.5, then the expected number of heads after one more toss will remain 0.5. Otherwise, the expected number of heads after one more toss will become closer to 0.5. It doesn't take any physical effect, just the fact that every flip increases the denominator of your fraction, but only half of the flips will reinforce any "excess" number of heads or tails.

$\endgroup$
1
$\begingroup$

There is no physical law in play here, just probablities.

Assume that either result (heads or tails) is equally likely. For any number of flips in a trial, N, it is easy to compute the probability of getting H heads.

For N = 2, H(0) = 0.25, H(1) = 0.5, H(2) = 0.25 (Four possible outcomes, two of which are HT and TH)

For N = 6, H(0) = 0.016, H(1) = 0.094, H(2) = 0.234, H(3) = 0.313, H(4) = 0.234, H(5) = 0.094, H(6) = 0.016 (64 possible outcomes, 50 of which are 2H4T, 3H3T, 4H2T).

Notice that for 6 flips, that chance you will see 2, 3, or 4 heads is 78%. As N gets bigger, the probabilities of getting a number of heads in the vicinity of the halfway mark is very great, and the likelihood of seeing very many or very few heads will be very small.

There is no force pushing to the mean, it's just the probability that you're seeing one of the very unlikely outcomes is very very small. But again then you might see it someday.

Note that this is just a restatement of Erick Wong's answer.

Imagine that there are 2^N tables in a vast room, each with N coins laid out on the table in a unique combination. Each table has a chair and you are dropped from the ceiling into the room and land in a chair at one of the tables. That is the "trial" you just ran. Chances are that that table will have approximately N/2 heads. Remember that out of 2^N tables (e.g. for 1000 coins, there will be over 10^301 tables), there is only one with no heads.

$\endgroup$
4
  • $\begingroup$ I appreciate the mathematical model. It's just hard for me to reconcile it with the laws of the physical world. After all, we use probability to make practical decisions, yet why should a 50% mathematical probability of tails correspond to a mean of 50% tails over 1,000,000 real-world trials. I see the reasoning behind the math, but not in the real world. Erick Wong hinted at it becoming physically harder to achieve a particular random result as options decrease, but I am still trying to wrap my head around this. $\endgroup$ Commented Jan 29, 2015 at 20:24
  • $\begingroup$ It's very unlikely you will get exactly 500,000 tails in 1,000,000 trials. But all the numbers around 500,000 are much more likely in aggregate than the extreme values. $\endgroup$
    – Ray Henry
    Commented Jan 29, 2015 at 20:34
  • 1
    $\begingroup$ Remember we only know what we see. We've never seen a million trials where all were heads, but it doesn't mean it can't happen. Your question about the laws of the physical world reminds me of those that ask why nature created a universe that is hospitable for humans, when the fact that we are here and this is the only universe we can observe turns the question on its head. $\endgroup$
    – Ray Henry
    Commented Jan 29, 2015 at 20:42
  • $\begingroup$ @user1891836 (3 comments up) in the frequentist interpretation, that is more or less the definition of probability. $\endgroup$
    – David Z
    Commented Jan 30, 2015 at 1:45
1
$\begingroup$

Consider coin tosses. The strong law of large numbers says that if the coin tosses are independent and identically distributed (iid.), then for almost any experiment, the averages converge to the probability of a head.

The degree to which the result is applicable in the 'real' world depends on the degree to which the assumptions are valid.

Both independence and identically distributed are impossible to verify for real systems, the best we can do is to convince ourselves empirically by many observations, symmetry in the underlying physics, etc. (As a slightly related aside, sometimes serious mistakes are made, for example, read the LTCM story.)

The iid. assumption ensures that no experiment is favoured. For example, in a sequence of $n$ coin tosses, there are $2^n$ experiments and each is 'equi-probable'. It is not hard to convince yourself that for large $n$ the percentage of experiments whose average is far from the mean becomes very small. There is no magic here.

I think a combination of the central limit theorem and the observed prevalence of normal distributions in the 'real' world provides stronger empirical 'evidence' that the iid. assumption is often a reasonable one.

$\endgroup$
2
  • 1
    $\begingroup$ It is not hard to convince yourself that for large n the percentage of experiments whose average is far from the mean becomes very small. There is no magic here. Sure, but why does the universe tend to fall in line with LLN? What is physically the reason for the total average to get closer to expected value? $\endgroup$ Commented Jan 30, 2015 at 9:52
  • $\begingroup$ When you make any binary valued observation, if the underlying process is iid. (or a reasonable approximation thereof) then the law of large numbers & central limit theorem apply. So, your question is, why do many measurements of aspects of the universe seem to iid? I don't know the answer, but would suppose that independence arises out of a lack of apparent 'communication' (for example, the coin has no state which it carries from one toss to the next) and identical arises from symmetry (why would a head be preferred over a tail?), or similar dynamics. $\endgroup$
    – copper.hat
    Commented Jan 30, 2015 at 16:36
1
$\begingroup$

Please also consider this : Most human games are flawed. The head or tails depends on the coin and the way it is thrown. One man throwing the same coin will probably have something far from 50-50, be it because he's a cheater or just put always the same force on the same side, making the coin flip the same number of times in the air.

But let's say now that you are considering different people with different hands, then you'll very likely to hit near 50-50 quite quick.

When playing the lottery, some people think they should play numbers that don't come up as often as others, as the LLN will "have" to make them appear more often now to compensate. This is twice wrong.

  1. As one already said, the law should not be understood as a magic hand that compensates for the first inequities. It just keeps a 50% chance on every try, and the first mistakes will just "dilute" into the number. There is no statistical reason to look at the previous throws, they don't impact the future ones.

  2. The practical case is even worse : since the coin (or the lottery balls) is not perfect, this imperfection will likely play the same role every time, making the same result more probable. So the truth in lottery is to play precisely the numbers that already won !

Of course, knowing that, the lottery guys are changing balls now and then...

$\endgroup$
1
$\begingroup$

Perhaps, a better way to understand the concept is to compute the probability of many trials coming out balanced. For example, if we flip a coin 10 times then the probability that the number of heads/tails will be within 10% of each other is only 24.6%. However, as we flip the coins more times the probability that the number of heads/tails will be close to each other (within 10%) increases:

100 trials: 38.3%

1000 trials: 80.5%

10,000 trials: 99.99%

Thus, there is no need to stipulate a "law", we can simply compute the probability of balance occurring and see that it increases as we do more trials. Note that there is always a chance of imbalance occurring. For example, after 10,000 coin flips there is a 0.007% chance that the number of heads will not be within 10% of the count of tails.

$\endgroup$
1
$\begingroup$

Strong Mathematical explanation.

First I present another experiment which, in my sense, will be of your interest.

Let $x_1,x_2, \cdots$ be an infinite sample obtained by observation on independent and normally distributed real-valued random variables with parameters $(\theta,1)$, where $\theta$ is an unknown mean and the variance is equal to $1$. Using this infinite sample we want to estimate an unknown mean. If we denote by $\mu_{\theta}$ a linear Gaussian measure on ${\bf R}$ with the probability density $\frac{1}{\sqrt{2\pi}}e^{-\frac{(x-\theta)^2}{2}}$, then the triplet $$({\bf R}^N,\mathcal{B}({\bf R}^N),\mu_{\theta}^N)_{\theta \in R}$$ will be a statistical structure described our experiment, where ${\bf R}^N$ is a Polish topological vector space of all infinite samples equipped with Tychonoff metric and $\mathcal{B}({\bf R}^N)$ is the $\sigma$-algebra of Borel subsets of ${\bf R}^N$. By virtue of the Strong Law of Large Numbers we have $$ \mu_{\theta}^N(\{(x_k)_{k \in N}: (x_k)_{k \in N}\in {\bf R}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\}=1 $$ for each $\theta \in {\bf R}$, where $\mu_{\theta}^N=\mu_\theta \times \mu_\theta \times \cdots$.

We must wait that by our infinite sample $(x_k)_{k \in N}$ and by the consistent estimator $\overline{X}_n= \frac{\sum_{k=1}^nx_k}{n}$ when $n$ tends to $\infty$, we get a "good" estimation of the unknown parameter $\theta$. But let look to the set $$ S=\{ (x_k)_{k \in N}: (x_k)_{k \in N}\in {\bf R}^N~\&~\mbox{exists a finite limit} \lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}\}. $$ It is a proper vector subspace of $R^N$ and hence is "small"(more precisely, is Haar null set in the sense of Christensen(1973)). This means that our 'good" statistic is not defined on the complement of $S$ which is a "big" set(more precisely, is prevalent in the sense of Christensen(1973)).

This means that for "almost every"(in the sense of Christensen) our "good statistic"-sample average $\overline{X}_n$ has no limit.


Now let $x_1,x_2, \cdots$ be an infinite sample obtained by coin tosses. Then the statistical structure described this experiment has the form: $$ \{(\{0,1\}^N,B(\{0,1\}^N),\mu_{\theta}^N): \theta \in (0,1)\} $$ where $\mu_{\theta}(\{1\})=\theta$ and $\mu_{\theta}(\{0\})=1-\theta$. By virtue of the Strong Law of Large Numbers we have $$ \mu_{\theta}^N(\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\})=1 $$ for each $\theta \in (0,1)$. Note that $G:=\{0,1\}^N$ can be considered as a compact group. Since the measure $\mu_{0,5}^N$ coincides with the probability Haar measure $\lambda$ on the group $G$, we deduce that the set $A(0,5)=\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=0,5\}$ is prevalence. Since each $A(\theta) \subset G \setminus A(0,5)$ for $\theta \in (0;1)\setminus \{1/2\}$, where $$A(\theta)=\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\},$$ we deduce that they all are Haar null sets.

My answer to the Question: "Why is the universe NOT indifferent towards big samples of coin tosses? What is the objective reason for this phenomenon?" is the following: The set of infinite samples $ (x_k)_{k \in N}\in G:=\{0,1\}^N$ for which exist limit of sample average $\overline{X}_n$ when $n$ tends to $\infty$ and is equal to $0,5$ is a prevalent in the sense of Christensen(1973), equivalently, has full Haar $\lambda$-measure. Hence, the Strong Law of Large Numbers is not empirically proven.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .