130
$\begingroup$

Because I find them fascinating, I'd like to hear what folks in this community find as the most interesting statistical paradox and why.

$\endgroup$
0

22 Answers 22

109
$\begingroup$

It's not a paradox per se, but it is a puzzling comment, at least at first.

During World War II, Abraham Wald was a statistician for the U.S. government. He looked at the bombers that returned from missions and analyzed the pattern of the bullet "wounds" on the planes. He recommended that the Navy reinforce areas where the planes had no damage.

Why? We have selection effects at work. This sample suggests that damage inflicted in the observed areas could be withstood. Either planes were never hit in the untouched areas, an unlikely proposition, or strikes to those parts were lethal. We care about the planes that went down, not just those that returned. Those that fell likely suffered an attack in a place that was untouched on those that survived.

For copies of his original memoranda, see here. For a more modern application, see this Scientific American blog post.

Expanding upon a theme, according to this blog post, during World War I, the introduction of a tin helmet led to more head wounds than a standard cloth hat. Was the new helmet worse for soldiers? No; though injuries were higher, fatalities were lower.

$\endgroup$
4
  • 3
    $\begingroup$ I recall having read this in a couple of places before, but I don't have a reference at hand. Is there one that you can add? $\endgroup$
    – cardinal
    Commented Feb 28, 2012 at 17:18
  • 1
    $\begingroup$ @cardinal, I found some memos for you. Looks like the research was actually for the U.S. $\endgroup$
    – Charlie
    Commented Feb 28, 2012 at 17:45
  • $\begingroup$ Somewhere, there's a scatterplot of a hypothetical airplane for this example, but I can't find it. $\endgroup$
    – Fomite
    Commented Jan 5, 2013 at 7:55
  • $\begingroup$ +1. This is an example of Survivorship Bias, perhaps the most detrimental of the biases. I expanded on it in an answer. $\endgroup$
    – Cliff AB
    Commented Feb 13, 2016 at 17:19
52
$\begingroup$

Another example is the ecological fallacy.

Example
Suppose that we look for a relationship between voting and income by regressing the vote share for then-Senator Obama on the median income of a state (in thousands). We get an intercept of approximately 20 and a slope coefficient of 0.61.

Many would interpret this result as saying that higher income people are more likely to vote for Democrats; indeed, popular press books have made this argument.

But wait, I thought that rich people were more likely to be Republicans? They are.

What this regression is really telling us is that rich states are more likely to vote for a Democrat and poor states are more likely to vote for a Republican. Within a given state, rich people are more likely to vote Republican and poor people are more likely to vote Democrat. See the work of Andrew Gelman and his coauthors.

Without further assumptions, we cannot use group-level (aggregate) data to make inferences about individual-level behavior. This is the ecological fallacy. Group-level data can only tell us about group-level behavior.

To make the leap to individual-level inferences, we need the constancy assumption. Here, the voting choice of individuals most not vary systematically with the median income of a state; a person who earns \$X in a rich state must be just as likely to vote for a Democrat as someone who earns \$X in a poor state. But people in Connecticut, at all income levels, are more likely to vote for a Democrat than people in Mississippi at those same income levels. Hence, the consistency assumption is violated and we are led to the wrong conclusion (fooled by aggregation bias).

This topic was a frequent hobbyhorse of the late David Freedman; see this paper, for example. In that paper, Freedman provides a means for bounding individual-level probabilities using group data.

Comparison to Simpson's paradox
Elsewhere in this CW, @Michelle proposes Simpson's paradox as a good example, as it indeed is. Simpson's paradox and the ecological fallacy are closely related, yet distinct. The two examples differ in the natures of the data given and analysis used.

The standard formulation of Simpson's paradox is a two-way table. In our example here, suppose that we have individual data and we classify each individual as high or low income. We would get an income-by-vote 2x2 contingency table of the totals. We'd see that a higher share of high income people voted for the Democrat relative to the share of low income people. Were we to create a contingency table for each state, however, we'd see the opposite pattern.

In the ecological fallacy, we don't collapse income into a dichotomous (or perhaps multichotomous) variable. To get state-level, we get the mean (or median) state income and state vote share and run a regression and find that higher income states are more likely to vote for the Democrat. If we kept the individual-level data and ran the regression separately by state, we'd find the opposite effect.

In summary, the differences are:

  • Mode of analysis: We could say, following our SAT prep skills, that Simpson's paradox is to contingency tables as the ecological fallacy is to correlation coefficients and regression.
  • Degree of aggregation/nature of data: Whereas the Simpson's paradox example compares two numbers (Democrat vote share among high income individuals versus the same for low income individuals), ecological fallacy uses 50 data points (i.e., each state) to calculate a correlation coefficient. To get the full story from in the Simpson's paradox example, we'd just need the two numbers from each of the fifty states (100 numbers), while in the ecological fallacy case, we need the individual-level data (or else be given state-level correlations/regression slopes).

General observation
@NeilG comments that this just seems to be saying that you can't have any selection on unobservables/omitted variables bias issues in your regression. That's right! At least in the regression context, I think that nearly any "paradox" is just a special case of omitted variables bias.

Selection bias (see my other response on this CW) can be controlled for by including the variables that drive the selection. Of course, these variables are typically unobserved, driving the problem/paradox. Spurious regression (my other other response) can be overcome by adding a time trend. These cases say, essentially, that you have enough data, but need more predictors.

In the case of the ecological fallacy, it's true, you need more predictors (here, state-specific slopes and intercepts). But you need more observations, individual-, rather than group-level, observations as well to estimate these relationships.

(Incidentally, if you have extreme selection where the selection variable perfectly divides treatment and control, as in the WWII example that I give, you may need more data to estimate the regression as well; there, the downed planes.)

$\endgroup$
3
  • $\begingroup$ How is it possible to formalize the consistency assumption? It sounds like assuming that there are no (causal) confounders missing from one's model. $\endgroup$
    – Neil G
    Commented Feb 29, 2012 at 1:26
  • 3
    $\begingroup$ Also, the example provided is also an example of Simpson's paradox because conditioning on the state reverses the correlation between income and party. When is the ecological fallacy different from Simpson's paradox? $\endgroup$
    – Neil G
    Commented Feb 29, 2012 at 1:28
  • 1
    $\begingroup$ I would also point out that making inferences about group-level associations or causation based on individual-level associations or causal relationships is also just a bad: the atomistic fallacy, well articulated here: [Diez-Roux, 1998] Diez-Roux, A. V. (1998). Bringing context back into epidemiology: variables and fallacies in multilevel analysis. American Journal of Public Health, 88(2):216–222. $\endgroup$
    – Alexis
    Commented Jul 21, 2014 at 16:46
45
$\begingroup$

My contribution is Simpson's paradox because:

  • the reasons for the paradox are not intuitive to many people, so

  • it can be really hard to explain why the findings are the way they are to lay people in plain English.

    tl;dr version of the paradox: the statistical significance of a result appears to differ depending on how the data are partitioned. The cause appears often to be due to a confounding variable.

Another good outline of the paradox is here.

$\endgroup$
2
35
$\begingroup$

There are no paradoxes in statistics, only puzzles waiting to be solved.

Nevertheless, my favourite is the two envelope "paradox". Suppose I put two envelopes in front of you and tell you that one contains twice as much money as the other (but not which is which). You reason as follows. Suppose the left envelope contains $x$, then with 50% probability the right envelope contains $2x$ and with 50% probability it contains $0.5x$, for an expected value of $1.25x$. But of course you can simply reverse the envelopes and conclude instead the left envelope contains $1.25$ times the value of the right envelope. What happened?

$\endgroup$
5
  • $\begingroup$ brilliant paradox - interestingly if we go with the "second" interpretation on wikipedia and try to calculate $E[B|A=a]$, we find that in order to prevent preference for switching we require $E[B|A=a]=a=2ap+\frac{a}{2}(1-p)$ where $p=Pr(A<B|A=a)$. Solving for $p$ means we get $p=\frac{1}{3}$. Similarly we can calculate $E[A|B=b]=b=2bq+\frac{b}{2}(1-q)$ where $q=Pr(B<A|B=b)$ and get $q=\frac{1}{3}$ ....Bizzare! $\endgroup$ Commented Mar 1, 2012 at 13:05
  • 6
    $\begingroup$ I have given presentations on this paradox in which the game is actually played with the audience, with real amounts of money (usually a check to the host institution). It gets their attention... $\endgroup$
    – whuber
    Commented Mar 1, 2012 at 15:50
  • $\begingroup$ Think I solved this one... The paradox is solved when we recognize the two envelope paradox incorrectly proposes 1) there are three possible quantities: 0.5x, x, and 2x, when there are only two quantities in the envelopes (say x and 2x), and 2) that we a priori know the left envelope contains x (in which case the right envelope would contain 2x with 100% certainty!). Given possible values of x and 2x randomly assigned to the two envelopes, the correct answer is an expected value of 1.5x whether I choose the left envelope or right envelope. $\endgroup$
    – RobertF
    Commented Oct 25, 2012 at 13:55
  • 3
    $\begingroup$ @RobertF The situation is more complicated. Suppose that it is known that the money is distributed in the two envelopes as follows. Toss a fair coin until it lands heads and count the number n of times the coin was tossed. Place 2^n dollars in one envelope and 2^(n+1) in the other. You can now perform very exact expectation computations and still retain the paradox. $\endgroup$ Commented Dec 24, 2012 at 23:13
  • $\begingroup$ There's been at least 150 papers attempting to resolve the two envelopes paradox, recent survey is link.springer.com/article/10.1007/s11238-022-09906-8 $\endgroup$ Commented Oct 16, 2022 at 18:06
32
$\begingroup$

The Sleeping Beauty Problem.

This is a recent invention; it was heavily discussed within a small set of philosophy journals over the last decade. There are staunch advocates for two very different answers (the "Halfers" and "Thirders"). It raises questions about the nature of belief, probability, and conditioning, and has caused people to invoke a quantum-mechanical "many worlds" interpretation (among other bizarre things).

Here is the statement from Wikipedia:

Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details. On Sunday she is put to sleep. A fair coin is then tossed to determine which experimental procedure is undertaken. If the coin comes up heads, Beauty is awakened and interviewed on Monday, and then the experiment ends. If the coin comes up tails, she is awakened and interviewed on Monday and Tuesday. But when she is put to sleep again on Monday, she is given a dose of an amnesia-inducing drug that ensures she cannot remember her previous awakening. In this case, the experiment ends after she is interviewed on Tuesday.

Any time Sleeping beauty is awakened and interviewed, she is asked, "What is your credence now for the proposition that the coin landed heads?"

The Thirder position is that S.B. should respond "1/3" (this is a simple Bayes' Theorem calculation) and the Halfer position is that she should say "1/2" (because that's the correct probability for a fair coin, obviously!). IMHO, the entire debate rests on a limited understanding of probability, but isn't that the whole point of exploring apparent paradoxes?

Prince Florimond Finds the Sleeping Beauty

(Illustration from Project Gutenberg.)


Although this is not the place to try to resolve paradoxes--only to state them--I don't want to leave people hanging and I'm sure most readers of this page don't want to wade through the philosophical explanations. We can take a tip from E. T. Jaynes, who replaces the question “how can we build a mathematical model of human common sense”—which is something we need in order to think through the Sleeping Beauty problem—by “How could we build a machine which would carry out useful plausible reasoning, following clearly defined principles expressing an idealized common sense?” Thus, if you like, replace S. B. by Jaynes' thinking robot. You can clone this robot (instead of administering a fanciful amnesiac drug) for the Tuesday portion of the experiment, thereby creating a clear model of the S. B. setup that can be unambiguously analyzed. Modeling this in a standard way using statistical decision theory then reveals there are really two questions being asked here (what is the chance a fair coin lands heads? and what is the chance the coin has landed heads, conditional on the fact that you were the clone who was awakened?). The answer is either 1/2 (in the first case) or 1/3 (in the second, using Bayes' Theorem). No quantum mechanical principles were involved in this solution :-).


References

Arntzenius, Frank (2002). Reflections on Sleeping Beauty. Analysis 62.1 pp 53-62. Elga, Adam (2000). Self-locating belief and the Sleeping Beauty Problem. Analysis 60 pp 143-7.

Franceschi, Paul (2005). Sleeping Beauty and the Problem of World Reduction. Preprint.

Groisman, Berry (2007). The end of Sleeping Beauty’s nightmare.

Lewis, D (2001). Sleeping Beauty: reply to Elga. Analysis 61.3 pp 171-6.

Papineau, David and Victor Dura-Vila (2008). A thirder and an Everettian: a reply to Lewis’s ‘Quantum Sleeping Beauty’.

Pust, Joel (2008). Horgan on Sleeping Beauty. Synthese 160 pp 97-101.

Vineberg, Susan (undated, perhaps 2003). Beauty’s Cautionary Tale.

All can be found (or at least were found several years ago) on the Web.

$\endgroup$
15
  • 1
    $\begingroup$ Do you think it's equally effective to formulate the solution in terms of "base units"? By that I mean, you have to consider whether the base unit is the person, or the interview. 1/2 of persons will have had a head, but 1/3 of interviews will. Then to choose our base unit, we can revisit the question and phrase as "What is the chance that this interview is associated with a 'heads' result?" $\endgroup$
    – Jonathan
    Commented Feb 29, 2012 at 18:35
  • 1
    $\begingroup$ SB does not know how many interviews there have been and the question is about her assessment of the probability, not the experimenters' assessment. From her point of view, the number of interviews cannot be determined. $\endgroup$
    – whuber
    Commented Feb 29, 2012 at 18:52
  • 2
    $\begingroup$ I think you should read the arguments in the literature first, Aaron. (I confess that I am a thirder, but I think the halfers will not find your reasoning convincing. At the very least, you need to show them why their argument is flawed.) $\endgroup$
    – whuber
    Commented Mar 8, 2012 at 22:01
  • 1
    $\begingroup$ Fair point, @whuber, I've now had a further look at the literature. I'm reading Ellis's Sleeping Beauty: reply to Elga. It's this sentence that worries me, at the start of section '4. My argument'. "Only new relevant evidence, centred or uncentred, produces a change in credence". I'll think further and maybe blog about it again. I had a long discussion with seven other PhD students about this! $\endgroup$ Commented Mar 10, 2012 at 17:01
  • 1
    $\begingroup$ Is Sleeping Beauty allowed to look at the calendar when awakened? If Monday, then she ought to reply P(X=head)=0.5. If Tuesday, then P(X=head)=0. $\endgroup$
    – RobertF
    Commented Oct 24, 2012 at 15:33
27
$\begingroup$

The St.Petersburg paradox, which makes you think differently on the concept and meaning of Expected Value. The intuition (mainly for people with background in statistics) and the calculations are giving different results.

$\endgroup$
4
  • 5
    $\begingroup$ Here is another that I like that seems so insufficiently known that it has no name attached to it, but has a similar flavor and an interesting statistical lesson: There exists a sequence of independent random variables $X_1,X_2,\ldots$ with mean zero and uniformly bounded variance such that $\sqrt{n} \bar X_n$ converges in distribution to a standard normal $\mathcal N(0,1)$ (just like the CLT). However, $\mathrm{Var}(\sqrt{n} \bar X_n) \to 17$ (or your favorite positive number). $\endgroup$
    – cardinal
    Commented Feb 28, 2012 at 17:36
  • $\begingroup$ @cardinal Any chance you could post some details of this as a separate answer? $\endgroup$
    – Silverfish
    Commented Jan 28, 2015 at 11:12
  • $\begingroup$ @Silver Let each $X_i$ have a Normal distribution with mean zero and variance $f(n)$. What would $f$ have to look like asymptotically for $\text{Var}(\sqrt{n}\bar X_n)$ to converge? $\endgroup$
    – whuber
    Commented Mar 23, 2015 at 22:13
  • $\begingroup$ @whuber Presumably I should read that as $X_i$ having variance $f(i)$; in which case (using independence of the $X_i$) we have $\mathrm{Var}(\sqrt{n}\bar X_n) = \frac{1}{n}\sum_{i=1}^n f(i)$ so we need the sequence $f(i)$ to be Cesàro summable if $\mathrm{Var}(\sqrt{n}\bar X_n)$ is to converge? $\endgroup$
    – Silverfish
    Commented Mar 24, 2015 at 0:30
23
$\begingroup$

The Jeffreys-Lindley paradox, which shows that under some circumstances default frequentist and Bayesian methods of hypothesis testing can give completely contradictory answers. It really forces users to think about exactly what these forms of testing mean, and to consider whether that's what the really want. For a recent example see this discussion.

$\endgroup$
13
$\begingroup$

One of my favorites is the Monty Hall problem. I remember learning about it in an elementary stats class, telling my dad, as both of us were in disbelief I simulated random numbers and we tried the problem. To our amazement it was true.

Basically the problem states that if you had three doors on a game show, behind which one is a prize and the other two nothing, if you chose a door and then were told of the remaining two doors one of the two was not a prize door and allowed to switch your choice if you so chose you should switch you current door to the remaining door.

Here's the link to an R simulation as well: LINK

$\endgroup$
12
$\begingroup$

Sorry, but I can't help myself (I, too, love statistical paradoxes!).

Again, perhaps not a paradox per se and another example of omitted variables bias.

Spurious causation/regression
Any variable with a time trend is going to be correlated with another variable that also has a time trend. For example, my weight from birth to age 27 is going to be highly correlated with your weight from birth to age 27. Obviously, my weight isn't caused by your weight. If it was, I'd ask that you go to the gym more frequently, please.

Here's an omitted variables explanation. Let my weight be $x_t$ and your weight be $y_t$, where $$\begin{align*}x_t &= \alpha_0 + \alpha_1 t + \epsilon_t \text{ and} \\ y_t &= \beta_0 + \beta_1 t + \eta_t.\end{align*}$$

Then the regression $$\begin{equation*}y_t = \gamma_0 + \gamma_1 x_t + \nu_t\end{equation*}$$ has an omitted variable---the time trend---that is correlated with the included variable, $x_t$. Hence, the coefficient $\gamma_1$ will be biased (in this case, it will be positive, as our weights grow over time).

When you are performing time series analysis, you need to be sure that your variables are stationary or you'll get these spurious causation results.

(I fully admit that I plagiarized my own answer given here.)

$\endgroup$
8
$\begingroup$

Parrondo's Paradox:

From wikipdedia: "Parrondo's paradox, a paradox in game theory, has been described as: A combination of losing strategies becomes a winning strategy. It is named after its creator, Juan Parrondo, who discovered the paradox in 1996. A more explanatory description is:

There exist pairs of games, each with a higher probability of losing than winning, for which it is possible to construct a winning strategy by playing the games alternately.

Parrondo devised the paradox in connection with his analysis of the Brownian ratchet, a thought experiment about a machine that can purportedly extract energy from random heat motions popularized by physicist Richard Feynman. However, the paradox disappears when rigorously analyzed."

As alluring as the paradox might sound to the financial crowd, it does have requirements that are not readily available in financial time series. Even though a few of the component strategies can be losing, the offsetting strategies require unequal and stable probabilities of much greater or less than 50% in order for the ratcheting effect to kick in. It would be difficult to find financial strategies, whereby one has $P_B(W)=3/4+\epsilon$ and the other, $P_A(W)=1/10 + \epsilon$, over long periods.

There's also a more recent related paradox called the "allison mixture," that shows we can take two IID and non-correlated series, and randomly scramble them such that certain mixtures can create a resulting series with non-zero autocorrelation.

$\endgroup$
7
$\begingroup$

I like the following: The host is using an unknown distribution on $[0,1]$ to choose, independently, two numbers $x,y\in [0,1]$. The only thing known to the player about the distribution is that $P(x=y)=0$. The player is then shown the number $x$ and is asked to guess whether $y>x$ or $y<x$. Clearly, if player always guesses $y>x$ then player will be correct with probability $0.5$. However, at least surprisingly if not paradoxically, player can improve on that strategy. I'm afraid I don't have a link to the problem (I heard it many years ago during a workshop).

$\endgroup$
1
  • 2
    $\begingroup$ Dear Ittay, I believe Tom Cover is the original source of this problem. I think it is also listed in his Open Problems in Communication and Computation, but I don't have it handy to check. It's a nice problem. The restriction to $[0,1]$, or, even a random $y$ (or $x$, for that matter) is inessential. Cheers. $\endgroup$
    – cardinal
    Commented Mar 2, 2013 at 19:50
7
$\begingroup$

Misspecification paradox

If $T$ is a method of statistical inference with a certain model assumption, say the true $P$ is assumed to be in some set ${\cal P}$ (e.g., $P$ may be assumed to be an i.i.d. normal distribution model for data $X_1,\ldots,X_n$), it is standard practice (in some quarters) to run a model misspecification test $M$, i.e., a test, that tests the null hypothesis $P\in {\cal P}$.

Assuming that $P(M$ rejects$)>0$ for $P\in {\cal P}$ (which is pretty much always fulfilled), it follows that the conditional distribution upon non-rejection $P(\bullet|M$ does not reject$)$ cannot be in ${\cal P}$. This is because $$ P(M\mbox{ rejects}|M\mbox{ does not reject})=0 $$ in contradiction to $P(M$ rejects$)>0 \forall P\in {\cal P}$.

If $T$ is only applied in case that $M$ does not reject, it means that the distribution that generates the data that go into $T$ is the conditional distribution that is not in ${\cal P}$.

In other words, testing the model assumption and passing it (i.e., not rejecting it and taking the model as valid for the data) actively violates the model assumption, even if it was fulfilled before!

See https://academic.oup.com/philmat/article-abstract/15/2/166/1572953?redirectedFrom=fulltext, https://arxiv.org/abs/1908.02218. As I am (co-) author of these papers, I should acknowledge that in principle this is known already at least since Bancroft (1944), see reference in the arxiv paper, although I believe I was the first to call it a paradox and to present it in a way that its paradoxical "nature" comes out.

$\endgroup$
6
$\begingroup$

It's interesting that the Two Child Problem and the Monty Hall Problem so often get mentioned together in the context of paradox. Both illustrate an apparent paradox first illustrated in 1889, called Bertrand's Box Paradox, which can be generalized to represent either. I find it a most interesting "paradox" because the same very-educated, very-intelligent people answer those two problems in opposite ways with respect to this paradox. It also compares to a principle used in card games like bridge, known as the Principle of Restricted Choice, where it resolution is time-tested.

Say you have a randomly selected item that I'll call a "box." Every possible box has at least one of two symmetric properties, but some have both. I'll call the properties "gold" and "silver." The probability that a box is just gold is P; and since the properties are symmetric, P is also the probability that a box is just silver. That makes the probability that a box has just one property 2P, and the probability that it has both 1-2P.

If you are told a box is gold, but not whether it is silver, you might be tempted to say the chances it is just gold are P/(P+(1-2P))=P/(1-P). But then you would have to state the same probability for a one-color box if you were told it was silver. And if this probability is P/(1-P) whenever you are told just one color, it has to be P/(1-P) even if you aren't told a color. Yet we know it is 2P from the last paragraph.

This apparent paradox is resolved by noting that if a box has only one color, there is no ambiguity about what color you will be told. But if it has two, there is an implied choice. You have to know how that choice was made in order to answer the question, and that is the root of the apparent paradox. If you aren't told, you can only assume a color was chosen at random, making the answer P/(P+(1-2P)/2)=2P. If you insist P/(1-P) is the answer, you are implicitly assuming there was no possibility the other color could have been mentioned unless it was the only color.

In the Monty Hall Problem, the analogy for the colors is not very intuitive, but P=1/3. Answers based on the two unopened doors originally being equally likely to have the prize are assuming Monty Hall was required to open the door he did, even if he had a choice. That answer is P/(1-P)=1/2. The answer allowing him to choose at random is 2P=2/3 for the probability that switching will win.

In the Two Child Problem, the colors in my analogy compare quite nicely to genders. With four cases, P=1/4. To answer the question, we need to know how it was determined that there was a girl in the family. If it was possible to learn about a boy in the family by that method, then the answer is 2P=1/2, not P/(1-P)=1/3. It's a little more complicated if you consider the name Florida, or "born on Tuesday," but the results are the same. The answer is exactly 1/2 if there was a choice, and most statements of the problem imply such a choice. And the reason "changing" from 1/3 to 13/27, or from 1/3 to "nearly 1/2," seems paradoxical and unintuitive, is because the assumption of no choice is unintuitive.

In the Principle of Restricted Choice, say you are missing some set of equivalent cards - like the Jack, Queen, and King of the same suit. The chances start out even that any particular card belongs to a specific opponent. But after an opponent plays one, his chances of having any one of the others are decreased because he could have played that card if he had it.

$\endgroup$
3
  • $\begingroup$ I don't follow your probabilities. If by "symmetric", you mean $P_G=P_S$ (which I think you mean), then shouldn't the probability of both be $P^2$, rather than $2P$? (This assumes independence, which I think you mean, although it would help to state that explicitly.) Furthermore, I think the probability of the box being neither should be $(1-P)^2$, rather than $1-2P$, shouldn't it? These can easily be seen if we consider the case where $P_G=P_S=.8$--then $P_{GS}=1.6$ & $P_{-G-S}=-.6$, unless by "symmetric" you mean that $P=.5$ & the properties are perfectly dependent. Sorry to nitpick. $\endgroup$ Commented Mar 7, 2012 at 14:57
  • $\begingroup$ Sorry, maybe I didn't explain it well trying to be as brief as possible. My P was not the probability a box has the color gold, it was the probability it was only gold. The probability it has the color gold is 1-P. And while the two properties are symmertic, they do not have to be independent, so you can't just multiply probabilities. Also, no box is "neither." Bertrand used three box with two coins in each: gold+gold, gold+silver, and silver+silver. A box with any number of gold coins is "gold" in my generalization. $\endgroup$
    – JeffJo
    Commented Mar 7, 2012 at 16:10
  • $\begingroup$ +1, that helps. I now see the phrase "at least one of two" and the word "just", which I must have skimmed over. $\endgroup$ Commented Mar 7, 2012 at 17:25
3
$\begingroup$

I find a simplified graphical illustration of the ecological fallacy (here the rich State/poor State voting paradox) helps me to understand on an intuitive level why we see a reversal of voting patterns when we aggregate State populations:

enter image description here

$\endgroup$
4
  • 3
    $\begingroup$ This is a nice example, but I think this is Simpson's Paradox: en.wikipedia.org/wiki/Simpson%27s_paradox $\endgroup$
    – Nick
    Commented Oct 30, 2012 at 18:12
  • 1
    $\begingroup$ @Nick: this particular example is actually distinct from Simpson's Paradox, but it can be hard to know which fallacy/paradox applies in a particular situation because they look the same statistically. The difference is that SP is a "false effect" that appears only when analyzing subgroups. This trend shown is though to be a "true effect" that appears only when analyzing subgroups. In this case, it suggests that while income as a raw number doesn't affect voting patterns in aggregate, income as related to your neighbors (your state) does influence voting patterns. $\endgroup$
    – Jonathan
    Commented Jan 15, 2013 at 20:56
  • $\begingroup$ It's the ecological fallacy, discussed below. $\endgroup$
    – Charlie
    Commented Jun 5, 2013 at 21:30
  • 3
    $\begingroup$ @Charlie 'below' and 'above' are functions of whatever way a reader of the page is sorting (active/oldest/votes), and in any case the order under some of the sorting criteria can change over time (including the default). As such, it's probably better to mention the person that posted the discussion you refer to, or even link to it. $\endgroup$
    – Glen_b
    Commented Jun 20, 2013 at 8:13
3
$\begingroup$

This is Simpson's Paradox again but 'backwards' as well as forwards, comes from Judea Pearl's new book Causal Inference in Statistics: A primer[^1]

The classic Simpon's Paradox works as follows: consider trying to choose between two doctors. You automatically choose the one with the best outcomes. But suppose the one with the best outcomes chooses the easiest cases. The other's poorer record is a consequence of trickier work.

Now who do you choose? Better to look at the results stratified by difficulty and then decide.

There is another side to the coin (another paradox) which says that the stratified outcomes can also lead you to the wrong choice.

This time consider choosing to use a drug or not. The drug has a toxic side effect, but its therapeutic mechanism of action is through lowering blood pressure. Overall, the drug improves outcomes in the population, but when stratifying on post-treatment blood pressure the outcomes are worse in both the low and the high blood pressure groups. How can this be true? Because we have unintentionally stratified on the outcome, and within each outcome all that remains to observe is the toxic side effect.

To clarify, imagine the drug is designed to fix broken hearts, and it does this by lowering the blood pressure, and instead of stratifying on blood pressure we stratify on fixed hearts. When the drug works, the heart is fixed (and the blood pressure will be lower), but some of the patients will also get the toxic side effect. Because the drug works, the 'fixed heart' group will have more patients who have taken the drug, than there are patients taking the drug in the 'broken' heart group. More patients taking the drug means more patients getting side effects, and apparently (but falsely) better outcomes for patients who didn't take the drug.

The patients who get better without taking the drug are just lucky. The patients who took the drug and got better are a mixture of those who needed the drug to get better, and those who would have been lucky anyway. Examining only patients with 'fixed hearts' means excluding patients who would have been fixed had they taken the drug. Excluding such patients means excluding the harm from not taking the drug which in turn means we only see the harm from taking the drug.

Simpson's paradox arises when there is a cause for the outcome other than the treatment such as the fact that your doctor only does tricky cases. Controlling for the common cause (tricky versus easy cases) allows us to see the true effect. In the latter example, we have unintentionally stratified on an outcome not on a cause which means the true answer is in the aggregate not the stratified data.

[^1]: Pearl J. Causal Inference in Statistics. John Wiley & Sons; 2016

$\endgroup$
3
$\begingroup$

Try the Borel��Kolmogorov paradox, where conditional probabilities behave badly. One example had the question

Let $X_1, X_2$ be independent exponential random variables with parameter $1$.

  1. Find the conditional PDF of $X_1+X_2$ given that $\frac{X_1}{X_2}=1.$
  2. Find the conditional PDF of $X_1+X_2$ given that $X_1-X_2=0.$
  3. The events $\frac{X_1}{X_2}=1$ and $X_1-X_2=0$ are the same. Does this mean that conditioning on either of these two events should give the same answer?

to which the answer appears to be

  1. $f_{X_1+X_2 \mid \frac{X_1}{X_2}=1}(x) = x e^{-x}$, a $\text{Gamma}(2,1)$ distribution with mean $2$
  2. $f_{X_1+X_2 \mid X_1 - X_2=0}(x) = e^{-x}$, and $\text{Exp}(1)$ distribution with mean $1$

and this can be confirmed by simulation.

Whether conditioning on $X_1=X_2$ is really consistent with the assumption that $X_1$ and $X_2$ are independent is a deeper question.

$\endgroup$
2
$\begingroup$

Suppose you obtained a data on births in royal family of some kingdom. In the family tree each birth was noted. What is peculiar about this family was that parents were trying to have a baby only as soon first boy was born and then did not have any more children.

So your data potentially looks similar to this:

G G B
B
G G B
G B
G G G G G G G G G B
etc.

Will the proportion of boys and girls in this sample reflect the general probability of giving a birth to a boy (say 0.5)? The answer and explanation can be found in this thread.

$\endgroup$
3
  • 2
    $\begingroup$ This answer reads like a puzzle, not like a paradox. I can imagine why you wanted to post it like that, but I think for this answer to qualify as paradox and to fit this thread, you need to be more explicit. $\endgroup$
    – amoeba
    Commented Mar 23, 2015 at 21:43
  • 2
    $\begingroup$ This question (with boys and girls interchanged) was asked at stats.stackexchange.com/questions/93830, which received a large number of answers--not entirely in agreement! (I learned something by taking the problem seriously and thinking about it in increasingly realistic ways, exploring the assumptions needed to do that.) $\endgroup$
    – whuber
    Commented Mar 23, 2015 at 21:52
  • $\begingroup$ @whuber thanks for the link! I added it into the description. $\endgroup$
    – Tim
    Commented Mar 23, 2015 at 21:56
1
$\begingroup$

I'm surprised no one has mentioned Newcombe's Paradox yet, although it is more heavily discussed in decision theory. It's definitely one of my favorites.

$\endgroup$
1
$\begingroup$

One of my "favorites", meaning that it's what drives me crazy about the interpretation of many studies (and often by the authors themselves, not just the media) is that of Survivorship Bias.

One way to imagine it is suppose there's some effect that is very detrimental to the subjects, so much so that it has a very good chance of killing them. If subjects are exposed to this effect before the study, then by the time study begins, the exposed subjects that are still alive have a very high probability of having being unusually resilient. Literally natural selection at work. When this happens, the study will observe that exposed subjects are unusually healthy (since all the unhealthy ones already died or made sure to stop being exposed to the effect).This is often misinterpreted as implying that exposure is actually good for the subjects. This is a result of ignoring truncation (i.e. ignoring the subjects who died and did not make it to the study).

Similarly, subjects who stop being exposed to the effect during the study are often incredibly unhealthy: this is because they have realized that continued exposure will probably kill them. But the study merely observes that those who quit are very unhealthy!

@Charlie's answer about the WWII bombers can be thought of as an example of this, but there's plenty of modern examples too. A recent example are the studies reporting that drinking 8+ cups of coffee a day (!!) is linked to much higher heart health in subjects over 55 years of age. Plenty of people with PhD's interpreted this as "drinking coffee is good for your heart!", including the authors of the study. I read this as you have to have an incredibly healthy heart to be still drinking 8 cups of coffee a day after 55 years of age and not have a heart attack. Even if it doesn't kill you, the moment something looks worrisome about your health, everyone that loves you (plus your doctor) will immediately encourage you to stop drinking coffee. Further studies found that drinking so much coffee had no beneficial effects in younger groups, which I believe is more evidence that we are seeing a survivorship effect, rather than a positive causal effect. Yet there's plenty of PhD's running around saying "Science says drinking 8+ cups of coffee is good for seniors!"

$\endgroup$
2
  • 1
    $\begingroup$ Im not so sure of your interpretation. In Norway drinking 8 cups of coffee a day isnt inusual at all, the mean value (including children and other nondrinkers) being around two cups a day. In Finland the mean is at around 2.5 cups a day. I used to drink mor ethan ten cups a day, butnot so anymore. $\endgroup$ Commented Jun 25, 2017 at 20:25
  • $\begingroup$ Late to the party, but the whole "coffee/caffeine is going to give you a heart attack" is not at all supported by science (see e.g. 1). Not even for people with a previous myocardial infarction (see e.g. 2 or 3). $\endgroup$ Commented Jun 3, 2021 at 8:33
0
$\begingroup$

The hot hand paradox.

Quoting Miller and Sanjurjo's paper:

Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this conditional relative frequency to be equal to the probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample 1 million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.

Intuitively, the problem is that the amount of available data is positively correlated with the number of heads - so a failure to follow heads with tails is more likely to occur in a sample where there are fewer instances of heads to analyse, giving it a larger impact on the calculated mean, introducing bias.

Sampling bias caused by this paradox went undetected in a notable study on the hot hand phenomenon in basketball for over thirty years (Wikipedia).

$\endgroup$
-1
$\begingroup$

Let x, y, and z be uncorrelated vectors. Yet x/z and y/z will be correlated.

$\endgroup$
3
  • 3
    $\begingroup$ Why is this a paradox ? it's seems intuitive. $\endgroup$ Commented Jun 5, 2013 at 15:58
  • 4
    $\begingroup$ I'd have been surprised if this weren't usually the case. $\endgroup$
    – Glen_b
    Commented Jun 20, 2013 at 8:10
  • 2
    $\begingroup$ It's unclear what $x/z$ and "correlated" are intended to mean. (Presumably "$x/z$" is componentwise division--assuming no components of $z$ are zero!) Is "correlated" to be interpreted in the sense of the correlation coefficient (essentially the standardized dot products) or are we to treat $X,Y,$ and $Z$ as random variables and consider their correlation coefficients in that sense? $\endgroup$
    – whuber
    Commented Mar 23, 2015 at 21:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.