7
$\begingroup$

I've always been confused on this part of probability. My (naïve?) definition of probability seems to be $Pr(X=x)=p$ meaning on average, $X$ would equal $x$ in a proportion $p$ of the time, as the number of trials goes to infinity.

However, this seems to be what the Law of Large Numbers says, and that Law is a theorem, not an axiom or definition of probability.

What actually is probability, if not a restatement of the Law of Large Numbers? This always bothers me - probability seems to be one huge circular argument. Where am I wrong?

$\endgroup$
2
  • 4
    $\begingroup$ The meaning of probability has been a subject of much discussion. There's been a (sometimes fierce) debate between representatives of the frequentist and Bayesian viewpoints. For example the book Probability:The Logic of Science by Jaynes presents a provocative/enlightening viewpoint. $\endgroup$
    – littleO
    Commented Aug 15, 2013 at 1:42
  • 8
    $\begingroup$ Probability is introduced axiomatically, like formal number theory is introduced axiomatically, or the theory of real numbers. We have certain intuitions about how probability ought to behave, just like we have intuitions about the natural numbers, or about the reals. It is nice and reassuring when many of these intuitions, such as the Law of Large Numbers, can be proved from the axioms. $\endgroup$ Commented Aug 15, 2013 at 1:44

4 Answers 4

6
$\begingroup$

You're confusing probability theory with the probability. Probability theory is a branch of mathematics with axioms that define notions of probability in terms of concepts primarily from measure theory. You have a state space $\Omega$, singleton's $\omega\in \Omega$, events $A\subset \Omega$ etc. You define an abstract concept of a probability measure. You define independence, etc. A priori, these definitions and rules have no embedding in the real world.

On the other hand, probability itself is a collection of interpretations of what probability really is. There are frequentists and Bayesianists. More on this later.

Consider an example of throwing a coin, whose outcome is either heads or tails. This can be axiomatized as follows: There is a state space $\Omega=\{H,T\}$ which represents outcomes of the coins, heads or tails. There is a random variable $X$ which is 1 if the coin is heads, 0 if tails. $X$ is a map from $\Omega$ to $\mathbb{R}$, which is the definition of a random variable. To say that the coin has probability $p$ of falling on heads is to say that $P(X=1)=P(\omega: \omega\in \Omega, \ X(\omega)=1)$.

$P$ is a probability measure, which is a function from $\Omega$ to $[0,1]$ that satisfies certain axioms. Notice that there has been made absolutely zero use of any sort of interpretation of what $p$ really is, other than just a number. Using this and further results, one can derive the Law of Large Numbers which in your context says that if you throw a coin $n$ times, independently with $X_i$ signifying the $i$'th outcome, then $P(|\frac{1}{n}\sum_{i=1}^nX_i-p|>\epsilon)\rightarrow 0$ for every $\epsilon>0$. There are stronger versions of this but the heuristic point is that the average number of heads converges to $p$, where the notion of convergence is with respect to the function $P$ which I have tried to highlight.

Now lets suppose you actually take a coin and flip it in the real world. For example lets say it's a fair coin What does that mean? That depends on what school of probability you belong to. The frequentist way would be to throw the coin a bunch of times and observe the frequence of heads. If you get roughly 1/2 after a lot of throws, you convince yourself the coin is fair. So $p=1/2$. This is now a definition for $p$ purely in terms of outcomes. $p$=(Number of Heads)/(Total Number of Throws). You now say that the chance you get heads on the next throw is $1/2$, meaning that if you were to throw the coin a few thousand times, then you'd expect around half the outcomes to be heads.

$\endgroup$
3
$\begingroup$

The answer to what probability is can only be a viewpoint -mine is: probability is an abstract mathematical concept. It has been axiomatized in various different ways. Its prevailing axiomatization and the one almost everybody uses today is the axiomatization formulated by Kolmogorov (1930's) in the context of measure theory. As a well-defined mathematical concept it may then be used to model and analyze any real-world phenomena that have a structure that accords with its properties.

The second part of the question can have a true answer and not a viewpoint: Τhe Law of Large Numbers says not such a thing: it says that (given the various regularity conditions), "the average value of a collection of random variables will tend to the average value of their average values" - and this statement is expressed in terms of its probability of being true: the concept of probability does not enter the premises of the theorem, the statement of the theorem does not assert something about probability -rather, the concept of probability qualifies the theorem's assertion Nothing circular here.

$\endgroup$
2
$\begingroup$

The law of large numbers is not a statement about probability in the intuitive sense, it's a statement about functions which satisfy the Kolmogorov axioms. Such functions don't necessarily have to have anything to do with frequencies or statistics.

For example, consider the interval $[0, 1]$. Define a function $L$ on measurable subsets of this interval by $L(A)=\text{length of $A$}$. Notice that I'm not imposing any kind of interpretation on $L$ as being a "probability", like the probability of hitting $A$ if you chucked a dart at the interval, or something like that. It's just the geometrical length, but nevertheless it's easy to see that it satisfies the Kolmogorov axioms. Now, under the function $L$, it turns out that $d_n(x) = \text{the $n$-th binary digit of $x$}$ defines a sequence of i.i.d. Bernoulli random variables, each with "probability" $0.5$, which in our case just means that the length of the set on which $d_n=1$ is $0.5$. The law of large numbers now tells us that the length of the set of numbers whose binary expansions have a equal asymptotic proportions of ones and zeroes is $1$. Notice how statistics never enter the picture: we're just doing geometry.

We can try to connect Kolmogorov's axioms with frequencies by saying something like this:

Consider some repeatable experiment. To say that a given set of outcomes has probability $p$ means that if you repeat the experiment a very large number of times, you'll get an outcome in that set approximately $p$ of the time.

How we can justify such a prediction is a separate matter. The point is: if we do assume this prediction, what can be deduced from it?

If we assume the above, then the function $P(A) = \text{long term frequency of $A$}$ satisfies the Kolmogorov axioms (kind of... if we table the discussion of issues like countable additivity and what "approximately" and "very large" mean).

As you point out, this seems like it's equivalent to just assuming the law of large numbers. But that's not quite the case. The LLN actually allows us to relax the above prediction slightly.

Let's say we want to apply the LLN to a coin flip. In that case, the "repeatable experiment" in question is the experiment of "tossing a coin a very large number of times", let's call this a "flip series". The LLN is then a statement about Kolmogorov functions on the set of all possible flip series outcomes.

Now of course, if we just assume that if we flip the coin a large number of times, we'll get heads about half the time, this is equivalent to assuming the LLN. But thanks to the LLN, we can assume something slightly weaker than that, and get LLN as a logical consequence. Namely, we only need the following assumption:

If I perform a large number of flip series, and only look at the outcomes of the $n$-th coin each time, then that coin will turn up heads about half the time.

Essentially, if you perform many many flip series, and represent the results in a table where each row is one series, like this:

$$ HHHHTTHTHTTTHTHHHTH... \\ TTHTHHTHHTHTHHTHTHT... \\ THHHTHTHTHHTHTHHHTT... \\ THTHTHHTHTHTHTHTHHT... \\ THTHTHTHTHHTHTHHTHT... \\ \vdots $$

Then the assumption you're making is that there are approximately $50\%$ heads in each column, and the LLN (if you also make the assumption of independence) allows you to conclude that therefore, there are also approximately $50\%$ heads in each row.

Admittedly, if you're going to make an assumption about the $n$-th coin flip in the series, it seems like you may as well make the same assumption about the flip series themselves. But remember that just because random variables are i.i.d., doesn't mean they have to represent the same physical experiment. For example, imagine you had a big box of lots of different coins, and they were all numbered, so they were distinguishable. In that case, the LLN allows you to convert a set of assumptions about the individual coins into a conclusion about all of the coins in aggregate.

$\endgroup$
0
$\begingroup$

In my opinion the other answers miss the point, the axioms of Kolmogorov are so weak in the mathematical sense, that while you can derive the law of large numbers from them, nothing in the axioms permits interpreting it the usual way (in fact probability is not defined in the axioms at all so you can not give any interpretation to the theorem at all) and you end up strictly speaking with an almost meaningless statement about "pure" numbers simply satisfying those weak axioms. The variables in the theorem have the meaning usually ascribed ("mean", "variance", "probability") only if additional assumptions are made, and it seems that the assumption necessary is precisely the law of large numbers itself. This is pretty much said by Kolmogorov himself:

We apply the theory of probability to the actual world of experiment in the following manner:

...

4) Under certain conditions, which we shall not discuss here, we may assume that the event A which may or may not occur under conditions S, is assigned a real number P(A) which has the following characteristics:

a) One can be practically certain that if the complex of conditions S is repeated a large number of times, n, then if m be the number of occurrences of event A, the ratio m/n will differ very slightly from P(A).

The only single thing learned from deriving the LLN purely mathematically from Kolmogorov's axioms, is that it establishes mathematical consistency between this additional assumption and the other axioms, no logical contradiction arises. For all practical purposes the LLN is an assumption about what P() is and not a theorem.

I have been trying to verify this understanding here:

Do the Kolmogorov's axioms permit speaking of frequencies of occurence in any meaningful sense?

$\endgroup$
1
  • $\begingroup$ Kolmogorov did a lot more than establish the basic axioms of probability. It seems that what is missing from your understanding is the measure theoretic definition of a random variable, which also happens to be attributed to Kolmogorov! $\endgroup$
    – jsk
    Commented May 3, 2014 at 5:52

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .