4
$\begingroup$

I am trying to understand the difference of the two solutions for the expected value of collisions for the birthday problem:

  1. https://math.stackexchange.com/a/35798/254705 derives the following solution:

... so the expected number of people who share birthdays with somebody is $n\left(1-(1-1/N)^{n-1}\right)$.

whereas

  1. https://math.stackexchange.com/a/952272/254705 gives

This leads to an expectation value $\lambda$ (date collisions in terms of the lambda distribution) of $\lambda = \frac{n(n-1)}{2m}$

For $2^{32}$ "days" and $10^6$ people,

  1. results in $E_1 = 232.8033$

and

  1. results in $E_2 = 116.4152$

It looks like that $\frac{E_1}{E_2} \approx 2$.

As $E_2$ gives the number of collision pairs instead of number of people involved in collisions, this seems reasonable to me.

Which of the solutions is correct? Is $E_2$ just an approximation?

$\endgroup$
1
  • $\begingroup$ It may be that triplets (or higher multiples) having the same birthday are counted differently by the two methods. $\endgroup$
    – Marconius
    Commented Jul 19, 2015 at 3:17

1 Answer 1

3
$\begingroup$

Both expressions are correct but they are measuring different things, and the issue is how you count cases where three birthdays or more coincide.

Take a simplified example of three people and two possible birthdays. There is a $\frac34$ probability that only two birthdays coincide and a $\frac14$ probability that all three birthdays coincide.

  • The first expression says that the expected number of people who share a birthday with somebody else is $3\left(1-(1-1/2)^{2}\right) =\frac94=2.25$; this is also $2 \times \frac34+3\times \frac14$.

  • The second expression says that the expected number of birthday pairs is $\frac{3 \times 2}{2\times 2} =\frac32 = 1.5$; this is also $1 \times \frac34+3 \times \frac14$.

So in this small example, you can see that both expressions are correct, but the first is less than double the second because of what happens when all three people share the same birthday.

The same is true with your example, with the possibility of three items colliding causing the difference (plus the very small probability of more than three items colliding on the same value).

If you wanted the number of days where two or more people had birthdays, the expression would be different again: it would be $N - N \left(1-\frac1N\right)^n - n \left(1-\frac1N\right)^{n-1}$.

  • In the small example this would be $2 - 2 \left(1-\frac12\right)^3 - 3 \left(1-\frac12\right)^2 = 1$ which is also $1 \times \frac34 + 1 \times \frac14$. In your large example it would be about $116.4045$.
$\endgroup$
1
  • 2
    $\begingroup$ Great explanation. So the difference in the small example is that 1. gives 3 people whereas 2. gives 3 pairs. $\endgroup$
    – pistermink
    Commented Jul 19, 2015 at 9:29

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .