206
$\begingroup$

I want to teach a short course in probability and I am looking for some counter-intuitive examples in probability. I am mainly interested in the problems whose results seem to be obviously false while they are not.

I already found some things. For example these two videos:

In addition, I have found some weird examples of random walks. For example this amazing theorem:

For a simple random walk, the mean number of visits to point $b$ before returning to the origin is equal to $1$ for every $b \neq 0$.

I have also found some advanced examples such as Do longer games favor the stronger player?

Could you please do me a favor and share some other examples of such problems? It's very exciting to read yours...

$\endgroup$
5
  • 2
    $\begingroup$ A variation on Penny's game: A new family moves to your block. Since you have children you are interested in what children they have: (1) You learn that they have two children, one of which is a girl. What is the probability the other child is also a girl? (2) You learn that they have two children, the older child a girl. What is the probability the other child is also a girl? Tell why the answers are NOT the same. $\endgroup$
    – user247327
    Commented Feb 16, 2017 at 16:45
  • 1
    $\begingroup$ @user247327 "Tell why the answers are the SAME" is also valid here. As it all depends on the assumptions you make for this question. The way it is phrased in this comment I would say 1/2 in both is the correct answer. The problem is that you assume a distribution with 4 states. Which is just an assumption for the stated problem. en.wikipedia.org/wiki/Boy_or_Girl_paradox $\endgroup$
    – Kami Kaze
    Commented Feb 17, 2017 at 10:31
  • 3
    $\begingroup$ Mildly interestingly, the random walk problem you mentioned in higher dimensions is more complicated. For $n = 2$, the same result applies: one returns to the origin a.s. In higher dimensions, the probability of returning to the origin is strictly between $0$ and $1$. $\endgroup$
    – anomaly
    Commented Feb 18, 2017 at 4:38
  • 4
    $\begingroup$ A complementary request for intuitive examples in probability would lead to far less answers ... $\endgroup$ Commented Nov 2, 2019 at 6:59
  • $\begingroup$ @user247327 "You learn that they have two children, one of which is a girl." Ok, let's try to imagine how you could learn this. Maybe you see from a distance that there are two children, and then one runs close enough that you see she is a girl. Or say you learn this by overhearing the parent say, "She's one of my two kids." When you think about it, most ways that you would normally learn this information do not work for the "trick" interpretation of the question. So this is not a case of a counter-intuitive answer, but a counter-intuitive question interpretation. $\endgroup$
    – Matt
    Commented Dec 2, 2020 at 18:58

32 Answers 32

126
$\begingroup$

The most famous counter-intuitive probability theory example is the Monty Hall Problem

  • In a game show, there are three doors behind which there are a car and two goats. However, which door conceals which is unknown to you, the player.
  • Your aim is to select the door behind which the car is. So, you go and stand in front of a door of your choice.
  • At this point, regardless of which door you selected, the game show host chooses and opens one of the remaining two doors. If you chose the door with the car, the host selects one of the two remaining doors at random (with equal probability) and opens that door. If you chose a door with a goat, the host selects and opens the other door with a goat.
  • You are given the option of standing where you are and switching to the other closed door.

Does switching to the other door increase your chances of winning? Or does it not matter?

The answer is that it does matter whether or not you switch. This is initially counter-intuitive for someone seeing this problem for the first time.


  • If a family has two children, at least one of which is a daughter, what is the probability that both of them are daughters?
  • If a family has two children, the elder of which is a daughter, what is the probability that both of them are the daughters?

A beginner in probability would expect the answers to both these questions to be the same, which they are not.

Math with Bad Drawings explains this paradox with a great story as a part of a seven-post series in Probability Theory


Nontransitive Dice

Let persons P, Q, R have three distinct dice.

If it is the case that P is more likely to win over Q, and Q is more likely to win over R, is it the case that P is likely to win over R?

The answer, strangely, is no. One such dice configuration is $(\left \{2,2,4,4,9,9 \right\},\left \{ 1,1,6,6,8,8\right \},\left \{ 3,3,5,5,7,7 \right \})$


Sleeping Beauty Paradox

(This is related to philosophy/epistemology and is more related to subjective probability/beliefs than objective interpretations of it.)

Today is Sunday. Sleeping Beauty drinks a powerful sleeping potion and falls asleep.

Her attendant tosses a fair coin and records the result.

  • The coin lands in Heads. Beauty is awakened only on Monday and interviewed. Her memory is erased and she is again put back to sleep.
  • The coin lands in Tails. Beauty is awakened and interviewed on Monday. Her memory is erased and she's put back to sleep again. On Tuesday, she is once again awaken, interviewed and finally put back to sleep.

In essence, the awakenings on Mondays and Tuesdays are indistinguishable to her.

The most important question she's asked in the interviews is

What is your credence (degree of belief) that the coin landed in heads?

Given that Sleeping Beauty is epistemologically rational and is aware of all the rules of the experiment on Sunday, what should be her answer?

This problem seems simple on the surface but there are both arguments for the answer $\frac{1}{2}$ and $\frac{1}{3}$ and there is no common consensus among modern epistemologists on this one.


Ellsberg Paradox

Consider the following situation:

In an urn, you have 90 balls of 3 colors: red, blue and yellow. 30 balls are known to be red. All the other balls are either blue or yellow.

There are two lotteries:

  • Lottery A: A random ball is chosen. You win a prize if the ball is red.
  • Lottery B: A random ball is chosen. You win a prize if the ball is blue.

Question: In which lottery would you want to participate?

  • Lottery X: A random ball is chosen. You win a prize if the ball is either red or yellow.
  • Lottery Y: A random ball is chosen. You win a prize if the ball is either blue or yellow.

Question: In which lottery would you want to participate?

If you are an average person, you'd choose Lottery A over Lottery B and Lottery Y over Lottery X.

However, it can be shown that there is no way to assign probabilities in a way that make this look rational. One way to deal with this is to extend the concept of probability to that of imprecise probabilities.

$\endgroup$
24
  • 33
    $\begingroup$ A classic example: For this, I think it's extremely important to eventually highlight that the reason switching is better is because monty knows which door actually has the car. If a door was opened at random after the initial choice, and happened not to contain the car, then your odds are actually no better to switch than not switch. The point being: The seemingly benign and/or irrelevant detail of whether monty opens a door without the car or at random is actually vitally important the answer of the question. $\endgroup$ Commented Feb 12, 2017 at 7:54
  • 26
    $\begingroup$ Concerning the last so-called 'paradox', it's only paradoxical if you make a false assumption that rational preference of one choice to another must imply a higher expectation. Indeed, given a uniform prior, each pair has the same expectation, but A and Y have zero variance and so are obviously preferable to B and X for a risk-averse person. It is therefore totally unfair to use this 'paradox' to claim a deficiency in the concept of probability. $\endgroup$
    – user21820
    Commented Feb 12, 2017 at 10:38
  • 6
    $\begingroup$ I think the Ellsberg Paradox is not a good example of a counterintuitive probability example, and certainly not for a short course in probability (which the OP will teach). The Ellsberg Paradox is meant to illustrate the difference between risk and Knightian uncertainty. $\endgroup$
    – wythagoras
    Commented Feb 12, 2017 at 11:43
  • 17
    $\begingroup$ The Monty Hall problem should be edited to include that the host always reveals a goat. If they don't always reveal a goat, the distribution can change arbitrarily. $\endgroup$ Commented Feb 13, 2017 at 5:01
  • 6
    $\begingroup$ Regarding the Monty Hall problem, as Chai T. Rex points out, it's critical that Monty will always reveal a goat. Which itself provides a nice way to catch out those who already know of the classic problem, while also providing a further lesson in probability - what happens if Monty randomly chooses one of the other doors to open, and if he reveals the car, you lose? If the sequence of actual events stays the same (you choose a door, Monty opens a door, there's a goat there, choose whether to switch), it'll seem like the situation is the same, but now there's no benefit to switching. $\endgroup$
    – Glen O
    Commented Feb 13, 2017 at 7:03
44
$\begingroup$

Birthday Problem

For me this was the first example of how counter intuitive the real world probability problems are due to the inherent underestimation/overestimation involved in mental maps for permutation and combination (which is an inverse multiplication problem in general), which form the basis for probability calculation. The question is:

How many people should be in a room so that the probability of at least two people sharing the same birthday, is at least as high as the probability of getting heads in a toss of an unbiased coin (i.e., $0.5$).

This is a good problem for students to hone their skills in estimating the permutations and combinations, the base for computation of a priori probability.

I still feel the number of persons for the answer to be surreal and hard to believe! (The real answer is $23$).

Pupils should at this juncture be told about quick and dirty mental maps for permutations and combinations calculations and should be encouraged to inculcate a habit of mental computations, which will help them in forming intuition about probability. It will also serve them well in taking to the other higher level problems such as the Monty Hall problem or conditional probability problems mentioned above, such as:

$0.5\%$ of the total population out of a population of $10$ million is supposed to be affected by a strange disease. A test has been developed for that disease and it has a truth ratio of $99\%$ (i.e., its true $99\%$ of the times). A random person from the population is selected and is found to be tested positive for that disease. What is the real probability of that person suffering from the strange disease. The real answer here is approximately $33\%$.

Here strange disease can be replaced by any real world problems (such as HIV patients or a successful trading / betting strategy or number of terrorists in a country) and this example can be used to give students a feel, why in such cases (HIV patients or so) there are bound to be many false positives (as no real world tests, I believe for such cases are $99\%$ true) and how popular opinions are wrong in such cases most of the times.

This should be the starting point for introducing some of the work of Daniel Kahneman and Amos Tversky as no probability course in modern times can be complete without giving pupils a sense of how fragile one's intuitions and estimates are in estimating probabilities and uncertainties and how to deal with them. $20\%$ of the course should be devoted to this aspect and it can be one of the final real world projects of students.

$\endgroup$
5
  • 4
    $\begingroup$ A quick intuitive explanation of the rough order of the answer to the birthday problem is to notice that for birthdays to be all distinct each pair has to be distinct, and there are $Θ(n^2)$ pairs for $n$ people. Thus roughly you expect that, if $n \in Θ(\sqrt{k})$ as $k \to \infty$, where $k$ is the number of possible birthdays, then there are $Θ(k)$ pairs each of which has probability $\frac1k$ of being equal, and so the total probability of distinct birthdays is roughly $(1-\frac1k)^{Θ(k)}$ which tends to $\exp(-c)$ for some positive constant $c$. $\endgroup$
    – user21820
    Commented Feb 13, 2017 at 4:17
  • $\begingroup$ In fact, this analysis can be made rigorous by using smoothing inequalities. Specifically $\frac{k-2r}{k} \le \frac{k-i}{k} \cdot \frac{k-(n-1)+i}{k} \le (\frac{k-r}{k})^2$ for any $i \in [0..(n-1)]$, where $r = \frac12(n-1)$. $\endgroup$
    – user21820
    Commented Feb 13, 2017 at 4:22
  • 4
    $\begingroup$ One way to interpret the "sickness" example is in terms of a signal to noise ratio. Namely, Bayes' rule gives us that for $S$ denoting sickness and $T$ denoting a positive test, $P(S \mid T)=\frac{P(T \mid S) P(S)}{P(T \mid S) P(S) + P(T \mid S^c) P(S^c)}$. Now as $P(S) \to 0$ with the conditional probabilities held constant, the ratio between this and $P(S)$ tends to $\frac{P(T \mid S)}{P(T \mid S^c)}$. $\endgroup$
    – Ian
    Commented Feb 15, 2017 at 16:35
  • 4
    $\begingroup$ (Cont.) This means that you approximately multiply your prior probability that the person is sick by $\frac{P(T \mid S)}{P(T \mid S^c)}$, which is usually rather large. This tells us that multiplying our prior probability by a fixed factor is essentially as much as we can do...which means that testing for rare conditions is simply hard. $\endgroup$
    – Ian
    Commented Feb 15, 2017 at 16:35
  • $\begingroup$ Yes this is exactly one of the points of "strange disease" problem is. As information to noise ratio( false positives) decreases it's very difficult to make right inference, which is supposedly still one of the least understood concept. But this is not surprising given how fragile people's intuition about grouped multiplication an division ( inverse multiplication) is in general while at the same time they have so much confidence in their estimates in this regard. It always felt so strange. $\endgroup$ Commented Feb 17, 2017 at 3:42
41
$\begingroup$

Airplane Seating

$100$ people are boarding a plane in a line and each of them is assigned to one of the $100$ seats on a plane. However, the first person in line forgot his boarding pass and as a result decided to sit down in a random seat. The second person will do the following:

  1. Sit in her seat if it still available.
  2. If her seat is not available, choose a random seat among the seats remaining and sit there.

Each following person sits according to the same rules as the second person. What is the probability the $100^{th}$ person will be able to sit in her assigned seat?

Most people think the probability is very small and think there is a tiny chance of the 100th person's seat being left after all the people move around. But the actual probability ends up being $\frac{1}{2}$.

$\endgroup$
5
  • $\begingroup$ Not only that; it seems any number of passengers >1 gives 1/2. I see how but not why if that makes any sense. $\endgroup$
    – JollyJoker
    Commented Feb 13, 2017 at 11:40
  • $\begingroup$ Is there any quick way to justify that? $\endgroup$
    – MR_BD
    Commented Feb 13, 2017 at 15:20
  • 21
    $\begingroup$ Yes. The 100th person either gets her own seat, or the first person's (no other seat can be unoccupied when she gets there, since it would have been taken by its owner). The other one of those two seats has taken by someone else. They must have picked it at random, so it's equally likely to be either of the two. $\endgroup$ Commented Feb 13, 2017 at 15:47
  • 3
    $\begingroup$ To elaborate on the previous, for anyone who has a chance of sitting in either the first or the last seat, both are available and equally likely, since picking either means everyone else (before the 100th) can take their assigned seat. $\endgroup$
    – JollyJoker
    Commented Feb 13, 2017 at 20:06
  • 6
    $\begingroup$ @LeilaHatami the probability of winning is obviously 1/2 in the 2-seat problem (either the first passenger chose their own seat, or your seat).In the n>2 seat problem there are three possibilities. 1/n: they choose their own seat (win, without any choices being made by anyone later). 1/n: they choose your seat (lose, regardless of any choices made later). (n-2)/n: they choose the mth passenger's seat. All of the passengers up to the mth will sit in their correct seats, so we can remove them and their seats, re-label the remaining seats, and reduce to the n-m seat problem. $\endgroup$
    – hobbs
    Commented Feb 16, 2017 at 7:37
36
$\begingroup$

I find that almost anything about probability is counter-intuitive to my college students on first encounter. Possibly this may depend on your audience. Here are a few examples:

$1.$ Question: "If a certain event has a $40\%$ chance of success, and we run $50$ experiments, then how many would you expect to succeed?" The most common responses I usually get are "all of them" and "none of them". This is after an hour-long lecture on the subject.

$2.$ Question: "Interpret this probability statement: There is a $30\%$ chance of rain today in the New York area." I usually only get about a $65\%$ successful response rate on this on a multiple-choice quiz, even after the hour-long lecture on the subject. Once I had a student so bamboozled by it that she called up the national meteorology service for a consultation.

$3.$ Question: "We have a hand of four cards $\{A, 2, 3, 4\}$, and pick out two at random; what is the probability we get the $A$ or $2$ ?" Common responses are $25\%$, $50\%$, and $75\%$. I've never had anyone in a class intuit the correct answer on first presentation.

$4.$ Question: "If you drive to school on a given day, you either get in an accident or you don't. Are these equally likely outcomes?" At least half of any class answers "yes" on the first presentation. This can be repeated with the same result with similar follow-up questions.

$\endgroup$
17
  • 35
    $\begingroup$ I can't believe anyone gets the first one wrong. $\endgroup$
    – minseong
    Commented Feb 13, 2017 at 1:24
  • 25
    $\begingroup$ @theonlygusti "How many of them would you expect to succeed?" I wouldn't expect any of them to succeed, but overall I'd expect to get 20 successes :) $\endgroup$
    – Rawling
    Commented Feb 13, 2017 at 7:59
  • 9
    $\begingroup$ @theonlygusti: #1 does sound surprising but not completely far-fetched. The description said nothing about the events being independent. If the events are assumed to be fully correlated, then the mode is having no successes at all. So I could get where that answer came from. $\endgroup$ Commented Feb 13, 2017 at 12:35
  • 9
    $\begingroup$ @DanielR.Collins I'm sure about half of all people surveyed would think it's true. $\endgroup$
    – Kyle
    Commented Feb 14, 2017 at 19:55
  • 9
    $\begingroup$ I find this really appalling. 1 and 4 are so basic in understanding the world, let alone mathematics or statistics, that everyone needs to have the correct understanding. $\endgroup$
    – Chappers
    Commented Feb 14, 2017 at 23:57
32
$\begingroup$

A while back, the xkcd blog posted this problem, which I found fascinating. Usually when I re-tell it, I do so slightly differently from the original author:

I have selected two numbers from $\mathbb{R}$, following some unknown and not necessarily independent distribution. I have written each number in a separate envelope. By fair coin toss, I select one of these two envelopes to open, revealing that number. I then ask the question "Is the number in the other envelope larger than this one?". You win if you guess correctly.

Can you win this game with probability $>\frac{1}{2}$? Note, that is a strict inequality. Winning with probability $=\frac{1}{2}$ is obviously easy.

Now, the solution to this starts out with a double-integral, so depending on the level of the class you're teaching it may not be appropriate.

$\endgroup$
17
  • 1
    $\begingroup$ Yeah, I saw this amazing example. I've mentioned it in the question "How to Win a Guessing Game" $\endgroup$
    – MR_BD
    Commented Feb 12, 2017 at 9:42
  • 1
    $\begingroup$ Oh, sorry @LeilaHatami! I should have clicked on your links first. I'll take a look at that video and see if the proof I'm writing is more or less neat than that, but given that it's a Numberphile video, they've probably presented a very neat and approachable answer. $\endgroup$
    – ymbirtt
    Commented Feb 12, 2017 at 9:58
  • 1
    $\begingroup$ You must guarantee your joint distribution is continuous with support $\mathbb{R}^2$, if not it won't work. @LeilaHatami: In fact, you don't need a double integral to solve it. You just pick $r$ from some fixed continuous distribution $R$ with support $\mathbb{R}$ (say $N(0,1)$) and compare $r$ against the revealed amount $x$, and answer "yes" iff $r > x$. Let the unknown amount be $y$. Then $P(x<y \mid r<x,y) = P(x>y \mid r<x,y) = \frac12$ because $P(x=y) = 0$. And $P(x<y \mid r>x,y) = P(x>y \mid r>x,y) = \frac12$. But $P(x<y \mid x<r<y) = P(x>y \mid x>r>y) = 1$ and $P(x<r<y \lor x>r>y) > 0$. $\endgroup$
    – user21820
    Commented Feb 13, 2017 at 3:24
  • 8
    $\begingroup$ I don't think the solution needs calculus at all. It needs that you can find a cumulative distribution function on $\Bbb R$ that is strictly monotonically increasing, like the $\arctan$. This means you accept the higher with greater probability than you accept the lower, which is all you need. $\endgroup$ Commented Feb 13, 2017 at 4:56
  • 1
    $\begingroup$ I could edit my comment in, but you can edit your own answer too. =) @LeilaHatami: If it helps, the point where I use the continuity of the joint distribution of the envelop amounts is to guarantee that $P(x = y \lor r = x \lor r = y) = 0$. This implies that we are justified in considering only the cases I did in my earlier comment. To express the reasoning in more intuitive terms, the game chooses $x,y$ jointly and then swaps them with probability $\frac12$. So if $r < x,y$ or $r > x,y$, then we answer correctly with probability $\frac12$. But if $r$ is in-between, we answer correctly surely. $\endgroup$
    – user21820
    Commented Feb 13, 2017 at 12:15
22
$\begingroup$

I particular like the triple-or-nothing game:

You start with $1$ sweet $^{[1]}$ in the pot. At each step, you can either choose to leave the game with all the sweets in the pot, or you can continue the game. If you continue, a fair coin is flipped, and if it comes up heads then the sweets in the pot are tripled, but if it comes up tails then the pot is emptied.

If you can play this game only once, how many sweets would you be willing to pay to play? And how should you play? (Assume that you want to get the most sweets possible.)

$^{[1]}$ Let's not be money-minded here...

The naive (and incorrect) analysis is to compute that at any step if there are $x$ sweets in the pot and you continue the game then the expected number of sweets in the pot will become $1.5x$. Thus you should not stop. But that is stupid; if you never stop you will never get any sweets! So when to stop?

Worse still, a correct analysis will tell you that no matter how many sweets you pay, you can play in such a way that the expected number of sweets you leave with is more than what you paid! The (silly) conclusion is that you should be willing to pay any number of sweets to play!

If you think really carefully about it, you will realize that expectation is a very poor indicator of rationality of choice. Instead, everyone will have some risk aversity, more generally a mapping from probability distributions to favourability. One possibility is that a probability distribution is unfavourable iff its median is not positive (representing no net gain). Then clearly this game will never be favourable to anyone with this kind of risk aversity except if you commit to playing for exactly one step. In real life, people will evaluate distributions in a much more complicated way than just checking the median.

That said, a reasonable rule of thumb is that it is not worth to make a decision whose estimated benefit does not have both positive mean and positive median. Positive mean is necessary for rules of thumb, otherwise you will not benefit in the long run. Positive median will prevent other stupid decisions such as playing the triple-or-nothing game for more than one step or paying more than 1.5 sweets to play it. More risk-averse people will play for zero steps and just take the initial sweet and leave!

This rule will show (reasonably) not only that it is not worth to pay even 2 sweets to play the triple-or-nothing game only once, but also that it is not worth to offer the game for others to play! Any application of probability to real-life decisions should be able to deal with such situations.


[Further remarks...]

My claim about the rule of thumb being reasonable is that it should work quite well in real life. Whether it agrees with various mathematical models of human rationality is irrelevant. Secondly, my rule of thumb is merely for determining whether a single option is worth taking or not. To compare between multiple choices of which you must pick one, you would have to extend the rule of thumb. One possible way is to define the value of each choice to be the minimum of the mean and median benefit. Then you of course pick the choice with the maximum value. Thirdly, different people will of course have different ways to evaluate a choice based on its benefit's probability distribution (assuming it can even be translated to some real number). A very risk averse person might take the minimum of the 1st percentile (roughly speaking the minimum benefit you believe you will gain in 99% of the cases) and the mean (average benefit). Someone else may combine the percentiles and mean in a different fashion, such as taking $-\infty$ as the value if the 5th percentile is below some threshold (such as representing serious hurt), but taking the mean otherwise.

$\endgroup$
25
  • 12
    $\begingroup$ I argue the problem is not in taking expectations, but the fact that "how much I value a pile of sweets" does not have a linear relationship with "how many sweets are in the pile". In fact, I assert the rational choice is to maximize the expectation of "how much I value the pile of sweets" $\endgroup$
    – user14972
    Commented Feb 12, 2017 at 19:25
  • 1
    $\begingroup$ @Hurkyl: By "moments" I mean the moments of the distribution, including the mean, variance and so on. And my analogy was supposed to be so that the number of sweets is precisely how much you value the pile. The problem therefore arises independent of you valuation as long as it is unbounded. If you believe that people have bounded valuation functions then yes that's another way to avoid the 'paradox'. But in my experience people do look at percentiles to make important decisions, whether conscious of it or not. $\endgroup$
    – user21820
    Commented Feb 13, 2017 at 3:00
  • 2
    $\begingroup$ @user21820: I'm not sure whether the valuation is bounded, but I'm pretty sure it becomes negative for sufficiently large $n$, at which point I'd keep playing in hopes of losing everything so that I'm not responsible for dealing with a cache of billions sweets. I would say the utility goes to $-\infty$ as $n \to \infty$ (as that many sweets crush me and the whole galaxy into nothingness), but the game definitely can't go on that long. $\endgroup$
    – user14972
    Commented Feb 13, 2017 at 3:37
  • 1
    $\begingroup$ @user21820: I'm afraid your analysis is way off. It follows from some very reasonable axioms of preference that you can find a von Neumann-Morgenstern utility function, one that the agent wants to maximize the expected value of. For humans, it will usually be sublinear in things like "money" or "sweets". This sublinearity is what is often called "risk aversity". But you can't have it both ways - you can't say both "the utility grows exponentially" and "but you don't really want to maximize your expected utility". By definition you want to maximize your expected utility. $\endgroup$ Commented Feb 13, 2017 at 9:38
  • 2
    $\begingroup$ @LeilaHatami: You can see the reason that the 5 people (not me) gave; namely they think your question is too broad. Well it is broad, but you tagged it as big-list and it is an interesting list so I wouldn't close it myself. Anyway, there are already 3 people voting to re-open it; 2 more and it will be opened again! $\endgroup$
    – user21820
    Commented Feb 13, 2017 at 16:00
22
$\begingroup$

Strongly related with OPs example is this consequence of the Arc sine law for last visits. Let's assume playing with a fair coin.

Theorem (false) In a long coin-tossing game each player will be on the winning side for about half the time, and the lead will pass not infrequently from one player to the other.

The following text is from the classic An Introduction to Probability Theory and Its Applications, volume 1, by William Feller.

  • According to widespread beliefs a so-called law of averages should ensure the Theorem above. But, in fact this theorem is wrong and contrary to the usual belief the following holds:

    With probability $\frac{1}{2}$ no equalization occurred in the second half of the game regardless of the length of the game. Furthermore, the probabilities near the end point are greatest.

$\endgroup$
8
  • 1
    $\begingroup$ @user21820: No worries! Everything's fine. $\endgroup$ Commented Feb 13, 2017 at 7:57
  • 3
    $\begingroup$ The "law of averages" is definitely a good fallacy to destroy early. $\endgroup$
    – mbomb007
    Commented Feb 13, 2017 at 20:09
  • 3
    $\begingroup$ What a delightful read (first link): One may summarize these results by stating that one should not get drunk in more than two dimensions. $\endgroup$
    – ojdo
    Commented Feb 13, 2017 at 23:34
  • 1
    $\begingroup$ @ojdo: Many thanks for your nice comment. If you like this paper, I'm pretty sure you will find the chapter III, Fluctuations of coin tossing and random walks of Feller's book inspiring and valuable to read. $\endgroup$ Commented Feb 14, 2017 at 7:54
  • 1
    $\begingroup$ @BenAaronson: Why? If there is unknown bias but each coin toss is independent, and at some point heads and tails equalize, then you have no information whatsoever about the bias because the data is symmetric due to the independence. $\endgroup$
    – user21820
    Commented Feb 15, 2017 at 4:35
18
$\begingroup$

A famous example is this one that is called St. Petersburg paradox:

Consider a game in which you earn $2^n\: \$ $ if you get $n$ consecutive Heads in a fair coin tosses. The fair entrance fee of this game is $\infty$

$\endgroup$
7
  • 2
    $\begingroup$ I'd just want to point out that this idea can be extended to a 'more strange' triple-or-nothing game as described in my answer, and we do not need any notion of $\infty$ to get apparently counter-intuitive results. $\endgroup$
    – user21820
    Commented Feb 12, 2017 at 14:28
  • 1
    $\begingroup$ I did not get this one .What is fair entrance fee ? $\endgroup$
    – user312097
    Commented Feb 12, 2017 at 22:32
  • 1
    $\begingroup$ @A---B: It's how much you have to pay to play the game such that the net expected outcome is zero. But if defined this way it would be wrong to say "fair entrance fee is $\infty$" since $\infty-\infty$ is not defined. See my answer for explanation of a related game where there is no finite fair entrance fee. If however you define "fair entrance fee" as "expected payoff" (using say measure theory for expectation) then it is indeed $\infty$ in these two games, though it does not have a clear real-world meaning this way. $\endgroup$
    – user21820
    Commented Feb 13, 2017 at 3:37
  • 1
    $\begingroup$ You didn't say when the game ends. So I'll just keep on playing. $\endgroup$
    – Wildcard
    Commented Feb 14, 2017 at 3:32
  • 1
    $\begingroup$ @Wildcard If you have an infinity source of money DO IT. What's better than playing dummy games when you have enough(!) money $\endgroup$
    – user415542
    Commented Feb 15, 2017 at 19:16
16
$\begingroup$

Bertrand's Paradox

Given two concentric circles ($S_1$, $S_2$) with radii $R_1=r$ and $R_2=\frac{r}2$, what is the probability, upon choosing a chord $c$ of the circle $S_1$ at random, that $c\:\cap\: S_2 \neq \emptyset$ ?

Simply speaking, your task is to

choose a chord of the larger circle at random and find the probability that it will intersect the smaller circle.

Surprisingly, Bertrand's Paradox offers three distinct yet valid solutions.

The same problem can also be stated as:

Given an equilateral triangle inscribed in a circle, find the probability of randomly choosing a chord of the circle greater than the length of a side of the triangle.

The counter-intuition steps in when you understand that the answer to the stated problem is $\frac12,\:\frac13,$ and even $\frac14$ at the same time, and all three answers are perfectly valid.

The crucial reason why there are three solutions to this in different cases is that the methods of selection of random variables are different in each case.

Here's the Wikipedia page for details on how each value is obtained and through what steps.

I remember that a professor had begun my high-school level probability class using Bertrand's Paradox as an introductory example.

$\endgroup$
5
  • 4
    $\begingroup$ This is less "counterintuitive" and more a cautionary tale about the use of the term "random". Still a good answer, as it reveals a critical detail of probability in an effective and interesting way. $\endgroup$
    – Glen O
    Commented Feb 13, 2017 at 15:13
  • 1
    $\begingroup$ The reason I used the word counter-intuitive is because one would not generally expect to have multiple valid solutions for a seemingly ordinary problem. And yes, what you say makes sense. $\endgroup$
    – axolotl
    Commented Feb 13, 2017 at 15:16
  • 5
    $\begingroup$ The seeming validity of all solutions comes from the fact that there is a vagueness in the question, though. Not unlike a question saying "Roll a dice. What is the chance of rolling a 6?" - if it's a standard dice and unweighted, then it's about 1/6, but it could be weighted, or it could fail to have a 6 on it, or have only 6s, or have more/less than 6 sides, etc. $\endgroup$
    – Glen O
    Commented Feb 13, 2017 at 15:54
  • $\begingroup$ Interesting. How would one make the said question more specific? What condition(s) or additional premises need one include so that an ordinary mathematician would reach at exactly one solution? $\endgroup$
    – axolotl
    Commented Feb 13, 2017 at 16:11
  • 4
    $\begingroup$ If you mean the Bertrand's Paradox question, then one needs to define how the chord is randomly selected. The word "random" usually implies "uniformly random" in a situation like this, but you need to specify the domain. So you might say "choose a chord of the larger circle by randomly selecting two points on its circumference" (implying the two points are chosen uniformly), which gives 1/3 as the only valid answer. In short, the missing information is the "selection method" - with it defined, the solution is unique. $\endgroup$
    – Glen O
    Commented Feb 13, 2017 at 16:32
16
$\begingroup$

The Shooting Room Paradox

A single person enters a room and two dice are rolled. If the result is double sixes, he is shot. Otherwise he leaves the room and nine new players enter. Again the dice are rolled, and if the result is double sixes, all nine are shot. If not, they leave and 90 new players enter, and so on (the number of players increasing tenfold with each round). The game continues until double sixes are rolled and a group is executed, which is certain to happen eventually (the room is infinitely large, and there's an infinite supply of players).

If you're selected to enter the room, how worried should you be? Not particularly: Your chance of dying is only 1 in 36. Later your mother learns that you entered the room. How worried should she be? Extremely: About 90 percent of the people who played this game were shot. What does your mother know that you don't? Or vice versa?

$\endgroup$
15
$\begingroup$

Base rate fallacy

If presented with related base rate information (or generic information) and specific information (information only pertaining to a certain case), the mind tends to ignore the former and focus on the latter.

Example:
A group of police officers have breathalyzers displaying false drunkenness in 5% of the cases in which the driver is sober. However, the breathalyzers never fail to detect a truly drunk person. One in a thousand drivers is driving drunk. Suppose the police officers then stop a driver at random, and force the driver to take a breathalyzer test. It indicates that the driver is drunk. We assume you don't know anything else about him or her. How high is the probability he or she really is drunk?

Intuitive first answer might be as high as 0.95, but the correct probability is about 0.02.

Solution : Using Bayes's theorem.

The goal is to find the probability that the driver is drunk given that the breathalyzer indicated he/she is drunk, which can be represented as $${\displaystyle p(\mathrm {drunk} |D)}$$
where "D" means that the breathalyzer indicates that the driver is drunk.

Bayes's theorem tells us that

$$ {\displaystyle p(\mathrm {drunk} |D) = {\frac {p(D|\mathrm {drunk} )\, p(\mathrm {drunk} )}{p(D)}}} $$

We were told the following in the first paragraph:

$${\displaystyle p(\mathrm {drunk} )=0.001} $$ $${\displaystyle p(\mathrm {sober} )=0.999} $$ $${\displaystyle p(D|\mathrm {drunk} )=1.00} $$ $${\displaystyle p(D|\mathrm {sober} )=0.05} $$

As you can see from the formula, one needs p(D) for Bayes' theorem, which one can compute from the preceding values using
$${\displaystyle p(D)=p(D|\mathrm {drunk} )\,p(\mathrm {drunk} )+p(D|\mathrm {sober} )\,p(\mathrm {sober} )} $$

which gives $$ {\displaystyle p(D)=(1.00\times 0.001)+(0.05\times 0.999)=0.05095} $$

Plugging these numbers into Bayes' theorem, one finds that $$ {\displaystyle p(\mathrm {drunk} |D)={\frac {1.00\times 0.001}{0.05095}}=0.019627 \approx 0.02 } $$


A more intuitive explanation: on average, for every 1,000 drivers tested, 1 driver is drunk, and it is 100% certain that for that driver there is a true positive test result, so there is 1 true positive test result 999 drivers are not drunk, and among those drivers there are 5% false positive test results, so there are 49.95 false positive test results.
Therefore, the probability that one of the drivers among the $$1 + 49.95 = 50.95 $$positive test results really is drunk is $$ {\displaystyle p(\mathrm {drunk} |D)=1/50.95\approx 0.019627} $$

$\endgroup$
2
  • 5
    $\begingroup$ I find this fallacy easier to explain with whole numbers. For every 20,000 drivers tested, 20 are drunk (and will show drunk on a test), and 999 are not drunk but will falsely show as drunk on a breathalyzer test. So the odds that someone showing drunk is actually drunk is 20/1019...unless, of course, he or she was pulled over for swerving and running a red light. ;) $\endgroup$
    – Wildcard
    Commented Feb 14, 2017 at 4:13
  • $\begingroup$ That's actually a more simple and better example. And yes, the randomness of choosing drivers is inherently important, otherwise another parameter which defines driver's competency while being drunk or sober comes into play :D $\endgroup$
    – ABcDexter
    Commented Feb 14, 2017 at 5:29
13
$\begingroup$

I have also had a lecture on this excellent topic.

Unfortunately, my lecture notes are in Czech, but I can translate some paradoxes from there:

Monty Hall

Monty Hall is in my opinion the most famous probabilistic paradox. It is described here and on the Wiki well. So I just provide you a way how to make it feel intuitive, to persuade other people that the real probability is computed correctly. It is quite impressive, almost like a magician trick :-)

Take a pack of cards and let someone draw a random card. Tell him that he want to draw the ace of hearts and he should not look on the chosen card. Then show to audience all remaining cards but one which are not the ace of hearts. So then there are two hidden cards: One in your hand and one in his hand. Finally, he may change his first guess. Most people do it and they are most likely correct :-).

Tennis-like game

There are two players, Alice and Bob, playing a tennis-like game. Every time, one player serves the ball and winning probabilities of the ball depends on the player. Player who first reaches say 11 points wins the match. Alice serves the first ball. Then there are three possible schemes:

  1. The winner of the previous ball serves the next.
  2. The loser of the previous ball serves the next.
  3. Service is regularly alternating.

One would expect that scheme 1. helps stronger players and scheme 2 helps weaker players. The paradox is that winning probabilities of the match do not depend on the chosen scheme.

Proof sketch: Pregenerate 11 cards with the winner of Alice services (pack A) and 10 cards with the winner of Bob's service (pack B). Then each Alice's (or Bob's) service can be modeled just by drawing a card from the pack A (or B). It can be shown that these 21 cards suffice for any of these 3 presented schemes. And the winner is determined by the cards: there is exactly one player written on at least 11 cards.

Candies

I have a bag of candies, there are 123 caramel candies and 321 mint candies. Every morning I randomly draw candies from the pack and eat them while they are all the same. When I take a different kind of candy, I return it back. What is the probability that the last eaten candy will be the caramel one?

Answer: 1/2. (one would expect that it is less than 1/2 since there are less caramel candies)

Proof: It suffices to show that every morning the probability that all caramel candies will be eaten is the same as the probability that all mint candies will be eaten. We can imagine that candies are randomly ordered every morning and I am drawing them from left to right. I eat all caramel candies if the order is "First caramel candies, then mint ones.". I eat all mint candies if the order is the opposite.

Wolf on a circle

There is a wolf at one vertex of a regular n-gon. There is a sheep at every remaining vertex. Each step, the wolf moves to a randomly chosen adjacent vertex and if there is a sheep, the wolf eat it. The wolf ends when it eats n-2 sheep (so there remains just one sheep).

Intuitively, the sheep at the opposite vertex from the wolf is in the best position. The paradox is that all sheep have the same probability of survival.

Proof: Take one sheep S. The wolf will definitely get to an adjacent vertex to S for the first time. This time the sheep on the other adjacent vertex is still not eaten. So for S survival, the wolf have to go around the whole circle without eating S. The probability of going around the whole circle from one vertex adjacent to S to the other without visiting S does not depend on the position of S.

Simpson's Paradox

There is a research on two medical cures A, B.

200 people tried the cure A, it helped to 110 people (50 men, 60 women) and did not helped to 90 people (60 men, 30 women).

210 people tried the cure B, it helped to 120 people (30 men and 90 women) and did not helped to 90 people (40 men, 50 women).

So in general, the cure B is better since 120:90 > 110:90.

But if you are a man, you can consider just men statistics: 50:60 > 30:40, so the cure A is more appropriate.

And if you are a woman, you can consider just women statistics: 60:30 > 90:50, so the cure A is again more appropriate.

Shocking, isn't it? :-)

$\endgroup$
4
  • $\begingroup$ These are all very nice, and I hadn't seen #2 or #3 before, thanks! $\endgroup$ Commented Feb 16, 2017 at 10:21
  • $\begingroup$ I like the Candy-chain one a lot! But your 'proof' for Wolf-on-the-circle does not seem to yield a proof of the claim of equal probability. Can you add that? $\endgroup$
    – user21820
    Commented Feb 18, 2017 at 9:17
  • $\begingroup$ Sufficient now? $\endgroup$ Commented Feb 18, 2017 at 11:48
  • $\begingroup$ Yep. I think I didn't get notified of your response. What I missed earlier was simply that the wolf will almost surely get adjacent to any specified sheep at some point, and after the first time your added paragraph shows that it has a certain constant probability of getting to the other side of that sheep without eating it. Nice! I already upvoted so I can't do it again. =) $\endgroup$
    – user21820
    Commented Mar 5, 2017 at 11:56
12
$\begingroup$

I think the most stunning example are the non-transitive dice.

Take three cubic dice with the following numbers on their sides:

  • Die $A:$ $3 \: 3 \: 3 \: 3 \: 3 \: 6$
  • Die $B:$ $2 \: 2 \: 2 \: 5 \: 5 \: 5$
  • Die $C:$ $1 \: 4 \:4 \: 4 \: 4 \:4$

Now I offer you the following game: You choose a die as you like, then I choose another die, and then we are rolling and the highest number wins.

No matter which die you choose, I can choose another one that wins more often than loses against your choice.

$\endgroup$
6
  • $\begingroup$ Thank you. It's another version of "Penney's Game" which I've mentioned in the question $\endgroup$
    – MR_BD
    Commented Feb 13, 2017 at 7:10
  • $\begingroup$ Oops. The Article you've linked is written by James Grime, Who is the talker of the video I've mentioned... $\endgroup$
    – MR_BD
    Commented Feb 13, 2017 at 7:15
  • $\begingroup$ @LeilaHatami: Well, I didn't take the time to look through the videos, so I still don't know Penney's game. $\endgroup$
    – celtschk
    Commented Feb 13, 2017 at 8:47
  • $\begingroup$ It has a Wiki page: en.wikipedia.org/wiki/Penney's_game $\endgroup$
    – MR_BD
    Commented Feb 13, 2017 at 9:51
  • 1
    $\begingroup$ We started with Brad Efron's non-transitive dice and made them even more paradoxical: Lake Wobegon Dice (see below). $\endgroup$ Commented Feb 14, 2017 at 1:05
12
$\begingroup$

This is one of my favorites:

100 prisoners problem

The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner's number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards. If, during this search, every prisoner finds his number in one of the drawers, all prisoners are pardoned. If just one prisoner does not find his number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy—but may not communicate once the first prisoner enters to look in the drawers.

If every prisoner selects $50$ drawers at random, the probability that a single prisoner finds his number is $50\%$. Therefore, the probability that all prisoners find their numbers is the product of the single probabilities, which is $1/2^{100} \approx 0.0000000000000000000000000000008$, a vanishingly small number. The situation appears hopeless.

Up to which value can prisoners improve the probability of being pardoned using a good strategy?

$\endgroup$
11
$\begingroup$

The boy or girl paradox already mentioned by Agnishom has an interesting variation:

''Suppose we were told not only that Mr. Smith has two children, and one of them is a boy, but also that the boy was born on a Tuesday: does this change the previous analyses?'' (for the question ''what is the probability that both children are boys'')?

Using some elementary computations with Bayes formula, the seemingly useless information that a child was born on Tuesday, changes the results.

To understand the intuition behind, consider an extreme case where you knew that one boy was born on December $30$. Then it is very unlikely that the other child is born on that date too, so one child is ''specified'' by the date-information. This reduces the question to ''is the other child a boy''? and changes the probability from $\frac13$ to approximately $\frac12$.

However, I do not recommend to use this example for teaching, as there are many interpretations of this paradox (that partially depend on language nuances of the formulation) and it can add more confusion then clarify something.

$\endgroup$
7
  • 2
    $\begingroup$ The last couple paragraphs make it perfectly clear, actually, culminating in this sentence: "The moral of the story is that these probabilities do not just depend on the known information, but on how that information was obtained." As an IT professional dealing with such questions as, "How many servers out of 10000 are likely to have this issue given that these 12 servers have the issue?" and the fact that data is often gained by sometimes badly written SQL queries (that filter out too much or too little), this is a crucial thing to learn. Thanks for this. $\endgroup$
    – Wildcard
    Commented Feb 14, 2017 at 4:05
  • $\begingroup$ Actually, this seems to be extremely related to "sampling bias"; is it not? $\endgroup$
    – Wildcard
    Commented Feb 14, 2017 at 4:09
  • $\begingroup$ @Wildcard Might be. It may also be related to the philosophical question how good the concept of "probability" is well-defined at all (see the Sleeping beauty paradox mentioned somewhere above)... but I'm not an expert on this. $\endgroup$ Commented Feb 14, 2017 at 9:55
  • $\begingroup$ +1 for the last paragraph. If the question is phrased as "A mother with two children tells you that at least one is a girl", you actually don't have enough information to work out the probabilities, and I'd argue that based on common-sense assumptions, 1/2 is a better answer than 1/3. I think the most clear way to phrase this is always to state it as a question and answer: "You ask a mother if either of her children is a girl, she says yes." $\endgroup$ Commented Feb 14, 2017 at 13:55
  • 2
    $\begingroup$ What this calls into question in my mind is not the concept of probability, but the human tendency to neglect the necessary information on which to compute probabilities--as a result of which we unwittingly sabotage our own questions in probability. (The same thing happened with the Monty Hall puzzle in another question.) See math.stackexchange.com/a/4401 for further discussion of the flaws in the above presentation of the boy-girl question. $\endgroup$
    – David K
    Commented Feb 14, 2017 at 14:41
10
$\begingroup$

One that I found surprising as a beginner was that three events (or random variables) can be independent pairwise, but not jointly independent. Or to put it somewhat more strikingly, we can have $C$ independent of $A$ and independent of $B$, yet not independent of $A,B$. Wording it this way shows that it does take care to state independence assumptions carefully, and it also illustrates some non-obvious subtleties in the definition of independence (one of them being that independence of two events does not mean that they don't interact and can't be influenced by a third factor).

One example of pairwise but non-mutual independence is given on the Wikipedia page.

The example that I typically use is to take $T$ to be a uniformly random angle in $[0,2\pi)$ and then consider the events $A = \{\sin T > 0\}$, $B = \{ \cos T > 0\}$ and $C = \{ \tan T > 0 \}$ (effectively, this is just two independent $\pm 1$ variables and their product, but the trig formulation helps to visualize the events in terms of quadrants).

It's easy to see that $P(A)=P(B)=P(C) = \tfrac12$ and that $P(A\cap B) = P(A\cap C) = P(B\cap C) = \tfrac14$, but clearly $P(A\cap B\cap \overline C) = 0$.

$\endgroup$
5
  • $\begingroup$ Could you add some reference to this for further reading? $\endgroup$ Commented Feb 17, 2017 at 9:04
  • $\begingroup$ @MartinEnder Thanks, I added a wiki link and an explicit example. $\endgroup$
    – Erick Wong
    Commented Feb 17, 2017 at 12:26
  • $\begingroup$ Oh I see now. I think I misunderstood your wording at first. I read it as "three events can't be jointly independent" whereas you're saying "it's possible for three events to be mutually independent while at the same time not being jointly independent". $\endgroup$ Commented Feb 17, 2017 at 12:33
  • 6
    $\begingroup$ Simpler example, write $1$ on the heads side of a coin and $0$ on the tails, then let $A$ and $B$ be two independent tosses of this coin and $C$ be $1$ if $A$ is equal to $B$ and $0$ otherwise. Then $C$ is independent from $A$ and from $B$, but clearly not from both. $\endgroup$ Commented Feb 18, 2017 at 9:02
  • 2
    $\begingroup$ I think it's more instructive to change to a simpler example like the one @BenMillwood gave. Equivalently, and more symmetrically, just take a random 3-bit string that has even parity. Then the 3 bits are pairwise independent but every bit is totally dependent of the other two bits. Trivial to prove too. $\endgroup$
    – user21820
    Commented Feb 18, 2017 at 9:09
8
$\begingroup$

It's not counter intuitive but it's amazing for teaching in class.

Pick $a,b \in [n]$ randomly. $\mathbb{P}[gcd(a,b)=1]$ tends to $\frac{6}{\pi^2}$ as $n$ goes to infinity.

Also, there is some other interesting problem whose answers have $\pi , e ,...$

$\endgroup$
10
  • $\begingroup$ What is the meaning of $[n]$? And where would I find a proof of this? $\endgroup$
    – littleO
    Commented Feb 13, 2017 at 7:42
  • $\begingroup$ @littleO Consider $\prod (1-\frac{1}{i^2})$ and use Euler product formula $\endgroup$
    – user410372
    Commented Feb 13, 2017 at 8:08
  • $\begingroup$ Ok thanks. I'm still curious what $[n]$ means and what kind of textbook contains this type of result. $\endgroup$
    – littleO
    Commented Feb 13, 2017 at 8:42
  • 1
    $\begingroup$ @littleO It means {1,2,...,n}. You can find it here: mathoverflow.net/questions/97041/… and here: math.stackexchange.com/questions/64498/… I have forgot where I read it but it's a well-known fact. You probably can find it easily through the web. $\endgroup$
    – user410372
    Commented Feb 13, 2017 at 8:48
  • 1
    $\begingroup$ @JollyJoker Yeah, It's very interesting. But it's not the sum of $\frac{1}{p^2}$ . It's $\prod(1-\frac{1}{p^2})$ and you have to use Euler product formula to show that the inverse of the quantity is equal to $\sum \frac{1}{i^2}$ $\endgroup$
    – user410372
    Commented Feb 13, 2017 at 9:30
8
$\begingroup$

Someone mentioned non-transitive dice, and that reminded me of this one:

Suppose there are two unweighted six-sided dice that you cannot examine, but which you can direct a machine to roll and inform you of the sum. You can do this as often as you like, and the distribution of the sum is exactly what you would expect of a pair of ordinary six-sided dice.

Are they, in fact, a pair of ordinary six-sided dice?

Not necessarily.

Then someone mentioned a secretary problem, which turned out to be about derangements. I had in mind a different secretary's problem, which is also called the sultan's dowry:

You have $100$ candidates, upon which there exists a total order. The candidates have appointments with you, one at a time, in a random order. From each interview, you can tell exactly how that candidate ranks amongst those you have already examined. At that point, you may either accept or reject the candidate. Any acceptance or rejection is permanent; no candidate, once rejected, may be reconsidered. Your objective is solely to accept the best candidate. What is the strategy for maximizing your probability of doing so, and what is that probability?

As often happens in probability puzzles, the answer is $1/e$*, which many people find surprisingly high.


*Approximately, with the approximation getting better and better as the number of candidates increases without bound.

$\endgroup$
3
  • $\begingroup$ The answer to the secretary problem is only approximately 1/e, with the approximation getting better as the number of candidates goes to infinity. $\endgroup$ Commented Feb 14, 2017 at 2:56
  • $\begingroup$ @AndersKaseorg: Well, yes. I was being rather imprecise, but certainly the probability is not exactly $1/e$ for finite numbers of candidates if the process is deterministic. $\endgroup$
    – Brian Tung
    Commented Feb 14, 2017 at 7:23
  • 1
    $\begingroup$ Thanks. I love you last sentence: As often happens in probability puzzles, the answer is 1/e $\endgroup$
    – MR_BD
    Commented Feb 14, 2017 at 7:25
8
$\begingroup$

There's the two envelopes paradox. Wikipedia states it as follows:

You are given two indistinguishable envelopes, each containing money, one contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?

The issue is that there is an amount $X$ of money in the envelope you have. If it's the lesser amount then switching gives you a reward of $2X$ and if you don't then you only get $X/2$. Since both cases are equally likely, it seems that you should switch, even though that is nonsensical since clearly your chance of getting the larger amount can only ever be 50%.

The Wikipedia article goes into great depth explaining various resolutions of this fallacy. It boils down to the fact that the sum of the two envelopes is the same in both cases, which means that the $X$s considered above aren't actually identical.

A related problem is the necktie paradox.

$\endgroup$
8
$\begingroup$

One I saw on Twitter recently, which is perhaps a clearer version of sex-of-children-type problems:

Three casino chips have a dot on each side:

  • on one chip the dots are both blue,
  • on the second there is a blue dot on one side and red on the other, and
  • on the third the dots are both red.

The chips are placed in a bag and, without looking, you choose one and place it on the table. When you look at the chip, you see it has a blue dot on top. What is the probability that the dot on the other side is blue?

Many people will say $1/2$ - I did before I thought properly - but...

you are blindly choosing both the chip, and which side to face up. So you have to consider that each dot has an equal chance of showing, making the chance $2/3$.

$\endgroup$
2
  • $\begingroup$ This is related to Monty Hall, as well as the bridge paradox of "Restricted Choice." $\endgroup$ Commented Jun 6, 2018 at 15:28
  • $\begingroup$ It would be $\frac12$ if you started with four chips: one with both dots blue, one with a blue dot on top and red below, one red on top and blue below, and one with both red. $\endgroup$
    – Henry
    Commented Feb 16 at 11:46
6
$\begingroup$

Perhaps Parrondo's Paradox would be interesting. One can combine losing propositions into a winning proposition.

Simpson's Paradox is also interesting. (And actually occurred in a court case.)

$\endgroup$
3
  • $\begingroup$ Yes, Simpson's Paradox, or the Yule-Simpson effect) is a good example: en.m.wikipedia.org/wiki/Simpson's_paradox. Also, the Dover paperback Counterexamples in Probability by Jordan M. Stoyanov is very useful for your purpose. $\endgroup$
    – PolyaPal
    Commented Feb 14, 2017 at 18:33
  • $\begingroup$ I came here to include Simpson's Paradox as well. Please expand your answer to explain what it is. $\endgroup$ Commented Feb 15, 2017 at 19:29
  • $\begingroup$ Parrondo's paradox is very interesting, if a bit sophisticated for an introductory class. I really think this answer deserves some expansion! $\endgroup$
    – Silverfish
    Commented Feb 15, 2017 at 21:58
6
$\begingroup$

Lake Wobegon Dice

Find a set of $n$ dice (each with $s$ sides, numbered appropriately), in which each die is more likely to roll above the set average on that roll than below the set average. Given $n$, find the Lake Wobegon Optimal set, in which that probability is maximum.

"Lake Wobegon Dice," by Jorge Moraleda and David G. Stork, College Mathematics Journal, 43(2):152--159 (2012)

Abstract:

  • We present sets of $n$ non-standard dice—Lake Wobegon dice—having the following paradoxical property: On every (random) roll of a set, each die is more likely to roll greater than the set average than less than the set average; in a specific statistical sense, then, each die is “better than the set average.”

    We define the Lake Wobegon Dominance of a die in a set as the probability the die rolls greater than the set average minus the probability the die rolls less than the set average. We further define the Lake Wobegon Dominance of the set to be the dominance of the set’s least dominant die and prove that such paradoxical dominance is bounded above by $(n-2)/n$ regardless of the number of sides $s$ on each die and the maximum number of pips $p$ on each side. A set achieving this bound is called Lake Wobegon Optimal. We give a constructive proof that Lake Wobegon Optimal sets exist for all $n \ge 3$ if one is free to choose $s$ and $p$. We also show how to construct minimal optimal sets, that is, that set that requires the smallest range in the number of pips on the faces.

    We determine the frequency of such Lake Wobegon sets in the $n = 3$ case through exhaustive computer search and find the unique optimal $n = 3$ set having minimal $s$ and $p$. We investigate symmetry properties of such sets, and present equivalence classes having identical paradoxical dominance. We construct inverse sets, in which on any roll each die is more likely to roll less than the set average than greater than the set average, and thus each die is “worse than the set average.” We show the unique extreme “worst” case, the Lake Wobegon pessimal set.

enter image description here

$\endgroup$
2
  • $\begingroup$ +1 I love these, even more than the non-transitive dice. $\endgroup$
    – Brian Tung
    Commented Mar 21, 2017 at 15:41
  • $\begingroup$ @BrianTung: Find me online, send me an email and I can send you a set of physical dice. $\endgroup$ Commented Apr 6, 2017 at 16:21
5
$\begingroup$

In contract bridge, there is the principle of restricted choice. It's always seemed counterintuitive to me.

https://en.m.wikipedia.org/wiki/Principle_of_restricted_choice

$\endgroup$
2
  • 1
    $\begingroup$ The Principle of Restricted Choice is identical to the Monty Hall Paradox. $\endgroup$
    – ttw
    Commented Feb 13, 2017 at 19:13
  • $\begingroup$ @ttw But Restricted Choice actually occurs in the real world, while the Monty Hall Paradox doesn't actually occur with the real Monty Hall, unless he very explicitly states the rules for when he opens a door to a goat. So, in some sense, restricted choice is a validation in a real game of chance that people actually play. $\endgroup$ Commented Jun 6, 2018 at 15:19
5
$\begingroup$

Optimizer's Curse: suppose you have a number of options to choose from, each with some objective true value. You don't have access to the objective true value, but instead to a noisy estimate of it, say a value sampled from a normal distribution with mean the true value and variance $1$.

You, naturally, pick the choice whose estimate of true value is the highest. When you do so, you discover what the true value really was. Call the difference between the estimate and the true value your post-decision surprise.

Now, the error of your estimate was normally distributed with mean $0$, so you might also guess that your post-decision surprise will have mean $0$ – sometimes you will be pleasantly surprised, sometimes disappointed. But in fact the post-decision surprise is usually negative, that is to say, you are usually disappointed by your choice!

In retrospect, this is perhaps not so surprising: certainly if all the true values were the same, you'd simply pick the estimate with the highest inflation due to noise, and more generally, conditional on an estimate being the leader of a pack, it's more likely to be an overestimate than an underestimate.

More interestingly, if the variances aren't the same for all your estimates, it's no longer correct to just pick the highest estimate to maximise your expected true value. If the highest estimate is also high-variance, it may lead to a lower true value in expectation than a lower estimate with better precision, and so what looks superficially like some kind of risk-aversion (discounting the value of apparently high-variance options) is actually justifiable purely on expected-value grounds.

(Notice also that "there a bunch of options with some true values, but you only get a noisy estimate of the true value, and you have to decide which to choose" is a REALLY common situation to be in, so this problem is pretty pervasive in real life optimization scenarios)

$\endgroup$
4
  • $\begingroup$ I actually don't think this one is great for an introductory course, because it's too subtle. But it's definitely a counterintuitive result in probability, and I think it needs to be better-known, so here it is. $\endgroup$ Commented Feb 18, 2017 at 9:33
  • $\begingroup$ I like the "persuasive commonness" of such situations! But the part I find surprising (if true) is not the fact that the simplistic method yields an overestimate, nor the fact that you need to take differing variances into account, but that it suffices to discount each estimate independent of the other choices estimated values and known variances! Is it really true, and can you include the discount formula? Thanks! $\endgroup$
    – user21820
    Commented Feb 18, 2017 at 9:39
  • 1
    $\begingroup$ Actually, now that you call me out on it, I don't know the formula, nor am I certain there is one, so I replaced my claim with a much weaker one. There's a paper which discussed a Bayesian method of correcting for optimizer's curse, and I had naïvely interpreted it as discounting, but I haven't read the paper in enough detail to know if that's actually how it works. $\endgroup$ Commented Feb 18, 2017 at 9:50
  • $\begingroup$ Okay thanks for your clarification! =) $\endgroup$
    – user21820
    Commented Feb 18, 2017 at 9:56
4
$\begingroup$

The secretary's problem (which has other names).The secretary has $n$ letters ($0<n<\infty$) and $n$ pre-addressed envelopes but puts the letters into the envelopes randomly, one letter per envelope. What is the chance $C(n)$ that NO letter gets into the right envelope?

The answer is $C(n)=\sum_{j=0}^n(-1)^j/j!,$ which converges to $1/e$ as $n\to \infty$. I think the method of solution is instructive.

One counter-intuitive result is that $C(n)$ is not monotonic in $n.$

Also many people would be inclined to guess that $C(n)>1/2$ for large $n.$

Another version of this is to take two shuffled decks, each with $n$ playing cards, and ask for the chance that no card occupies the same position in both decks.

I first saw this in "101 Great Problems In Elementary Mathematics" by H. Dorrie.

$\endgroup$
4
  • $\begingroup$ "take two identical shuffled decks, each with n playing cards"... the chance that no card occupies the same position in both decks is zero, as the two decks are identical. Perhaps it should say "two shuffled decks, each with the same $n$ playing cards"? $\endgroup$
    – Glen O
    Commented Feb 13, 2017 at 15:25
  • $\begingroup$ @GlenO. I meant identical as unordered sets. But I deleted the word. $\endgroup$ Commented Feb 13, 2017 at 19:13
  • 5
    $\begingroup$ When I see "the secretary problem", I generally think of the secretary selection problem (also known as sultan's dowry). Also $1/e$. I have a theory that says that $1/e$ is the most likely answer to a probability question; in fact, the probability that it is the answer is... $\endgroup$
    – Brian Tung
    Commented Feb 14, 2017 at 0:25
  • $\begingroup$ @BrianTung. I like that. $\endgroup$ Commented Feb 14, 2017 at 0:37
4
$\begingroup$

I see the Monty Hall has been mentioned a couple of times, but I want to mention it again because I think the reason that it's interesting is missed in the other answers. In particular, it demonstrates not only a counter-intuitive result for a given formulation, but it also demonstrates how sensitive the correct answer is to the formulation of the problem. I especially like this NYT article as an illustration:

http://www.nytimes.com/1991/07/21/us/behind-monty-hall-s-doors-puzzle-debate-and-answer.html?pagewanted=all

From a teaching point of view, this is a fun exercise because Monty Hall (the real person) is part of the article, and he plays a role both in validating the math on the "academic" version of the problem and in showing the math is meaningless in the real game because he has controls that are not in academic version. Moreover, after years of doing the show, he's good at reading individual contestants, so probability is not really at play in a significant way.

$\endgroup$
2
  • $\begingroup$ What's funny is that in the end even this article fails to consider the 'Lazy Monty' scenario, where Monty always opens an empty door, but always the one closest to him. In that case, if you chose door 1, and Monty opened door 2 (which is closer to him than door 3), then it still doesn't help you to switch. $\endgroup$
    – Bram28
    Commented Jun 22, 2017 at 18:18
  • $\begingroup$ While the math is meaningless for a real-world Monty (unless Monty sets out rules explicitly,) the same reasoning works in the game of bridge, where it is commonly called "Restricted Choice," and is just as counter-intuitive. $\endgroup$ Commented Jun 6, 2018 at 15:15
3
$\begingroup$

I remember to be confused the first time this riddle was proposed to me:

What is the probability of getting a poker hand with at least two aces, assuming that it contains at least one ace?

What if we know that it is indeed an ace of spades? does this information change the probability?

$\endgroup$
1
  • $\begingroup$ Can I know why the downvote? $\endgroup$
    – suitangi
    Commented Apr 12, 2017 at 19:45
3
$\begingroup$

This was grazed both in the original post and in the comments, but I'd like to flesh it out a bit.

Polya's Theorem

The simple random walk on $\mathbb Z^d$ is recurrent iff $d < 3$. Restated:

  • A random walk on $\mathbb Z$ started at the origin will eventually return to the origin (with probability $1$).

  • A random walk on $\mathbb Z^2$ started at the origin will eventually return to the origin (with probability $1$).

  • A random walk on $\mathbb Z^3$ started at the origin may return to the origin, but it also may not; the probability of an eventual return is about $0.35$.

Expected hitting time

Although a random walk on $\mathbb Z$ or $\mathbb Z^2$ that starts at the origin is guaranteed to return to the origin as above, the expected time to do so is infinite.

NB: This is a fairly elementary result in Markov Chain theory, so whether it is surprising is obviously up to the beholder. This is my favorite example of a random variable that is finite with probability 1, yet has infinite expected value.

Penney's Game, revisited

The Numberphile video explored the nontransitive nature of Penney's game; that is, for any 3-coin sequence I choose, you could respond by choosing a 3-coin sequence that is likely to appear first. There is another feature of Penney's Game that I find similarly counterintuitive: The expected wait time is not the same for all sequences of 3 coins. This is not a statement about the competitive likelihood of finding one before another, but rather a statement about the expected number of throws required to encounter a sequence in isolation. Specifically:

  • HHH / TTT require 14 flips on average

  • HTH / THT require 10 flips on average

  • The other 4 combinations require 8 flips on average

$\endgroup$
3
$\begingroup$

One of the most puzzling results in probability is that the probability of randomly (and with uniform probability) picking a rational number among the set of reals is zero. This is nicely explained here.

The set of rational numbers, for instance in the $\Omega=[0,1]$ interval is the countable union of disjoint singletons, and each one of these singletons has a probability of zero. Here is the proof:


A singleton, $\{b\}$, is a Borel measurable set with a Lebesgue measure of zero. The proof is as follows:

$$\Pr\left(\{b\}\right)=\Pr\left(\bigcap_{n=1}^\infty\left(b-\frac{1}{n},b + \frac{1}{n}\right]\cap \Omega\right)$$

is the probability of nested decreasing sets, allowing the use of the theorem of continuity of probability measures $(*)$ to re-write it as:

$$\Pr\left(\{b\}\right)=\lim_{n\rightarrow \infty}\,\Pr\left(\left(b-\frac{1}{n},b + \frac{1}{n}\right]\cap \Omega\right)$$

The probability of $$\Pr\left(b-\frac{1}{n},\,b + \frac{1}{n} \right)\leq \lim_{n\rightarrow \infty}\frac{2}{n}=0$$


Therefore, by countable additivity of measures $(**)$, the probability for the whole set of $\mathbb Q$ is zero:

$$\Pr(\mathbb Q\;\cap \Omega) = 0$$

The apparent paradox is that despite the infinity number of rational numbers in the $[0,1]$ interval, the probability of randomly choosing a rational is strictly zero.

The source is this great explanation here.


$(*)$ If $B_j, j = 1, 2,\cdots,$ is a sequence of events decreasing to $B$, then $\displaystyle\lim_{n\rightarrow \infty} \Pr \{B_n\} = \Pr \{B\} .$

$(**)$ For all countable collections $\{E_i\}_{i=1}^\infty$ of pairwise disjoint sets in a sigma algebra: $$\mu\left( \bigcup_{k=1}^\infty \, E_k \right)=\sum_{k=1}^\infty \mu(E_k).$$

$\endgroup$
2
  • $\begingroup$ It's worth noting that the term for this situation is "almost never" - where the probability is zero, but the event isn't impossible. $\endgroup$
    – Glen O
    Commented Feb 13, 2017 at 15:57
  • 5
    $\begingroup$ "even though this is not impossible in an experimental situation (throwing a dart)." — As physicist, I object. There are no darts with infinitely small dart point. And even if there were, quantum mechanics would not allow to hit an exact point with it. Not to mention that space itself might not be continuous. $\endgroup$
    – celtschk
    Commented Feb 13, 2017 at 23:24
2
$\begingroup$

Consider the $d-$dimensional sphere, then as $d$ goes to infinity the mass concentrates on the equator $x_1=0$

$\endgroup$
2
  • 3
    $\begingroup$ It needs some preliminaries to explain this. But I think it's a counter intuitive example in geometry rather than probability... $\endgroup$
    – MR_BD
    Commented Feb 12, 2017 at 19:17
  • 3
    $\begingroup$ I agree that to prove it you'll need some ( a lot) preliminaries . I guess that it depends on the formulation, if you say that the probability of choosing a point not on the equator goes to zero as n goes to infinity than it looks a lot more like probability $\endgroup$
    – DStoln
    Commented Feb 12, 2017 at 19:23

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .