3
$\begingroup$

I am solving the question:

How large must a class be to make the probability of finding two people with the same birthday at least 50%?

The first solution I came up with is rather simple. It's based on finding $N$ people such that any pair amongst the $N$ people have different birthdays. This can be simply solved as multiplying probabilities of $N$ people have different birthdays. The first person has a probability of 1 of having a different birthday. The 2nd person has a probability of (364/365) of having a different birthday from the first person. The 3rd has a probability of (363/365) of having a different birthday from the first 2 people, and so on.

$$ \frac{365}{365}\frac{364}{365}\cdots\frac{365-N+1}{365} < \frac{1}{2} \\ = \frac{^{365} P_N}{365^N} $$ It turns out $N=23$. This is the correct answer based on what I saw on Google.

I'm now trying to think of this problem in terms of combinatorics. So I first started with thinking of 365 distinguishable objects into $N$ bins without replacement. Order doesn't matter, so this is combinations, and we get $\binom{365}{N}$. Now I want to find the number of combinations of 365 birthdays into $N$ bins WITH replacement, and this is simply $\frac{(365+N-1)!}{N!(365-1)!}$. So then I was thinking the probability of less than half of getting $N$ people with different birthdays is then

$$ \frac{\binom{365}{N}}{\frac{(365+N-1)!}{N!(365-1)!}} < \frac{1}{2} $$

But if I plug in $N=23$, I don't get the $\approx \frac{1}{2}$ that is expected. I get $\approx \frac{1}{4}$. What is wrong with my thinking using the combinations approach?

$\endgroup$
3
  • 1
    $\begingroup$ The outcomes in the second version are not equally likely -- e.g. 12 birthdays on January 13 is less likely than one each on on the 1st of every month -- so you can't compute probabilities by dividing (size of event) / (size of sample space). In theory you approach the problem this way but you need to find the relative frequency of the "all birthdays distinct" event which, for practical purposes, comes back to looking at the people as distinguishable. $\endgroup$
    – Ned
    Commented Apr 15, 2020 at 14:14
  • $\begingroup$ You probably should do it with order. Without order $(Aug 31, Jan 12)=(Jan 12, Aug 31)$ is twice as likely $(Feb 3, Feb 3)$. And there are more ways to draw (and thus more likely) a combination with more distinct dates than one with more multiple dates. If do it with permutations and order matters I suspect you will get the correct answer. ... I think in general combinatorics is not a good way to do probability as the actual sample space order does matter. $\endgroup$
    – fleablood
    Commented Apr 15, 2020 at 15:36
  • $\begingroup$ @fleablood I tried to allude to this in my answer, but yes: the permutation approach immediately gives the right answer, since the numerator is "ordered lists of distinct birthdays" (i.e. $365 \cdots (365 - N + 1)$) and the denominator is "ordered lists of arbitrary birthdays" (i.e. $365^N$). $\endgroup$ Commented Apr 15, 2020 at 15:48

1 Answer 1

1
$\begingroup$

SHORT ANSWER: As @Ned said, the balls and bins should be distinguishable in your calculation.

LONG ANSWER:

First, recall that you should specify whether the balls are distinguishable and whether the bins are; in this case, both should be, because Eve being born on December 24 and Sam being born on July 4 is meaningfully different than having them switch birthdays. More to the point, consider the list of birthdays held by Eve and Sam; there should be twice as many ways for that list to be $\{\text{Dec 24}, \text{July 4}\}$ as for it to be $\{\text{Dec 24}\}$, which would require them both to have that same birthday. If you regard them as indistinguishable, then you're effectively regarding those two lists as being equally probable, when in reality they should not be.

A similar problem that may be easier to understand is: when you roll two dice, you are twice as likely to get a 2 and a 6 as you are to get double 6's. This comes from the fact that dice are distinguishable, and it's why that formula you applied doesn't work here.

The cardinal sin here is the confusion of whether "order" matters and what counts as "balls" and "bins". For the numerator, I don't think I see how you're thinking of distributing 365 balls into $N$ bins, because that would morally be like assigning every birthday to a person; instead, you should assign each person to a birthday, so you're distributing $N$ balls into $365$ bins (without replacement). But since the balls are people and are distinguishable, the order does matter, because the order corresponds to which person has which birthday. That is, having ball 1 go into the Dec 24 box and ball 2 go into the July 4 box is not the same thing as switching the two.

If you really want to go a route that has a combinatorial feel, I'd shy away from the balls / bins interpretation at all, because correctly applying that will lead you immediately back to permutations and a computation that looks like the correct approach you outlined originally. A combinatorial route would need to be weighted by how many times each term should appear -- i.e., correcting for the distinguishable / indistinguishable problem above -- and this will be much more laborious than it's worth.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .