The fallacy of the hot hand fallacy
Suppose you'd like to detect whether a coin ever goes on hot streaks or cold streaks, so that after $k$ heads in a row, the probability of the following flip coming up heads is different from the overall probability $p$ of a heads. To test this, you'll flip the coin $n$ times, and after any streak of $k$ heads, you'll record the outcome of the following flip. Let $X$ be the percentage of your recorded flips that came up heads. For concreteness, let's set the values at $p=\frac{1}{2}$, $n=100$, and $k=3$.
Here is the surprise: for these values, $E[X]\approx0.46$ (not $\frac{1}{2}$!!!!). And, in general for any $0<p<1$, $n\geq 3$, and $0<k<n$, $E[X]<p$, and the bias can be quite large for certain values of $n$ and $k$.
This is counterintuitive enough that when Gilovich, Vallone and Tversky wrote their seminal paper Hot Hand In Basketball in 1985 measuring whether basketball players went on "hot streaks", they used the exact method above to attempt to detect hot streaks, and since the percentage after three hits in a row was not different from the overall percentage, they concluded that there was no evidence of a hot hand. But this was a mistake! If there was no hot hand, they should have observed a significantly lower percentage on shots after three hits in a row. In fact, their data do show evidence for a hot hand in many of the cases, according to a new paper last month. This mistake went unchecked for 30 years, with untold numbers pop psychology books and articles citing the result as evidence for a "hot-hand fallacy".
Demonstration
Here's a demonstration in R.
f7 <- function(x){
# running total of run length
# stolen from http://tolstoy.newcastle.edu.au/R/e4/devel/08/04/1206.html
tmp <- cumsum(x)
tmp - cummax((!x)*tmp)
}
streak <- function(v, k = 3, n = length(v)) {
# returns a vector of length n = length(v) this is TRUE when the last k
# entires are True
c(FALSE, f7(v)[1:(n-1)] >= k)
}
random_shots <- function(n, p = 0.5) {
# takes n random shots with probability p of success
runif(n) < p
}
trial <- function(n, k = 3, p = 0.5) {
s <- random_shots(n, p)
mean(s[streak(s, k)])
}
# do simulation 100000 times
results <- sapply(1:100000, function(x) trial(100, 3))
summary(results)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# 0.0000 0.3636 0.5000 0.4615 0.5714 0.8571 3
What's going on?
One way to see it is to consider the absolute simplest case, where we flip the coin three times, and tally up the flips after a single head in a row.
Outcome Heads Flips Proportion
HHH 2 2 1
HHT 1 2 1/2
HTH 0 1 0
HTT 0 1 0
THH 1 1 1
THT 0 1 0
TTH 0 0 NA
TTT 0 0 NA
The last two outcomes, of course, can't be included in our tally because the proportion of heads is undefined. Now, if we repeat this experiment many times, we'll find that of the sequences that we record, $\frac{2}{6}$ of the time we will have a proportion of $1$, while $\frac{1}{6}$ of the time we'll have $\frac{1}{2}$, for an expected proportion of $$\frac{2}{6}\times{1} + \frac{1}{6}\times\frac{1}{2} = \frac{5}{12} < \frac{1}{2}$$
So, by inspection we can plainly see that in this case the expected proportion is less than 0.5, although at first glance this might still seem unsatisfactory. Yeah, it's less than 0.5 but... why?
I think there are a few ways to hand-wave about this. One has to do with the fact that we have two outcomes with proportion = $1$, but one of those ways has two heads, and the other only has one. So sequences with more heads are weighted the same as sequences with fewer heads, and in this way heads are somehow being underrepresented, leading to the bias.