3
$\begingroup$

In my Probability course, we have dealt with the Birthday problem. We have been told a reasonable "outcome space" is the $n$-tuples of integers between 1 and 365, identified with the days of the year, if we forget about years with a 29th of February. Then we placed the uniform probability on the space, noting that it is not the exact probability since not all combinations are equally probable, but that using it gives an approximation that is good, and that underestimates the actual probability of having two people with the same birthday. So I was wondering: how do I prove this actually is an underestimate?

$\endgroup$
1
  • 4
    $\begingroup$ Also, twins with high probability have the same birthday. $\endgroup$ Commented Oct 17, 2014 at 18:02

2 Answers 2

1
$\begingroup$

Denote the probability of a given person's birthday being on the $k$th day of a year of $n$ days by $p_k$. Then, the probability that two random people have the same birthday is $$P = \sum_{k = 1}^n p_k^2.$$ In particular, when $p_1 = \cdots = p_n$, this probability is $\frac{1}{n}$.

To show that this is an underestimate, it's enough to show that $P$ is actually minimized by the probability distribution $p_1 = \cdots = p_n$ (and to know that the probabilities aren't actually evenly distributed). To set up the minimization problem, note that if we know $p_1, \ldots, p_{n - 1}$, then $$p_n = 1 - p_1 - \cdots - p_{n - 1}.$$ So as a function of $p_1, \ldots, p_{n - 1}$, the probability that two random people have the same birthday is $$P(p_1, \ldots, p_n) = \sum_{k = 1}^n p_k^2 = \sum_{k = 1}^{n - 1} p_k^2 + (1 - p_1 - \cdots - p_{n - 1})^2,$$ and we must minimize this quantity over the region of possible probabilities, namely the simplex $S$ defined by the inequalities $$0 \leq p_1, \ldots, p_{n - 1}, \qquad p_1 + \cdots p_{n - 1} \leq 1.$$

Now, we must solve the system $$\frac{\partial P}{\partial p_k}, \qquad 1 \leq k \leq n - 1.$$ Differentiating the above formula for $p$ and setting to zero gives $$0 = \frac{\partial P}{\partial p_k} = 2 p_k - 2 (1 - p_1 - \cdots - p_{n - 1}).$$ The quantity in parentheses is $p_n$, so this system gives that $0 = 2 p_k - 2 p_n$, or rather, $$p_k = p_n, \qquad 0 \leq k \leq n - 1.$$ But this simply says that there is an critical point where $$p_1 = \cdots = p_n = \frac{1}{n},$$ as desired (in particular, this point is in the simplex $S$, i.e., it corresponds to a distribution of probabilities of birthdays). Since $P$ is quadratic in its arguments, the matrix $$\left(\frac{\partial^2 P}{\partial p_i \partial p_j}\right)$$ of mixed partials is constant, and checking shows that it is positive definite. So, the solution $$p_1 = \cdots = p_n = \frac{1}{n}$$ is the unique global minimum, i.e., if the distribution of birthdays is nonuniform, then $P\left(\frac{1}{n}, \ldots, \frac{1}{n}\right) = \frac{1}{n}$ is an underestimate of the probability that two random people have the same birthday.

Remark (One can avoid calculus when producing the system characterizing the critical point of the probability function $P$ simply by using that a quadratic function $q(t) = at^2 + bt + c$, $a \neq 0$ achieves its extremum at $t = -\frac{b}{2a}$.)

$\endgroup$
1
$\begingroup$

As noted by a previous answer, the probability that two random people have the same birthday is $\sum_k p_k^2$ where $p_k$ is the probability that someone has a birthday on day $k$. An easier way to show you are getting an underestimate or a perfect estimate (but not an overestimate) is to note that if two $p_{k_1}, p_{k_2}$ are not equal, then $2((p_{k_1} + p_{k_2})/2)^2 < p_{k_1}^2 + p_{k_2}^2$ because the function $f(p) = p^2$ is strictly convex (this can be shown either by algebra or calculus). Thus the minimum of the probability of getting two random birthdays the same is achieved when all $p_k$ are equal, namely $p_k = 1/n$, and if they are not all equal, then the probability of getting two random birthdays the same is greater.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .