Skip to main content
1 of 2
ShreevatsaR
  • 41.7k
  • 8
  • 96
  • 135

For those arriving here through a search (say for "expected number of collisions"), the following approximation may be useful.

If there are $k$ people and $N$ possible birthdays (or in case of a hash table, $k$ items being hashed into $N$ buckets), then the expected number of collisions (to be precise, the expected number of people who collide with at least one other) is exactly (see Henry's answer or André Nicolas's answer) $$ \begin{align} & k \left(1 - \left(1-\frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{N} - \frac{k(k-1)(k-2)}{2N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{N}. \end{align}$$


Note that you need to be clear about what "expected number of collisions" means. If there are $r$ buckets/birthdays each with two items/people in them, the above expression counts each member of each pair (giving count $2r$). If instead you want to count the number of buckets/birthdays that have multiple people in them, then the answer is approximately $$ \approx \frac{k^2}{2N}.$$

This can be got from the previous analysis from noting that the most common type of collisions will have 2 in each bucket (3-way and higher collisions will be statistically rare), so you just halve the count. Or, rigorously, you can do a similar analysis focusing on buckets/birthdays: the probability that either $0$ or $1$ of the $k$ people have that birthday is $$ \left(1 - \frac1N\right)^k + k\frac1N\left(1 - \frac1N\right)^{k-1}$$ So the number of buckets with multiple values in them is $$ \begin{align} & N \left(1 - \left(1 - \frac1N\right)^k - k\frac1N\left(1 - \frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{2N} - \frac{k(k-1)(k-2)}{3N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{2N} \end{align}$$

ShreevatsaR
  • 41.7k
  • 8
  • 96
  • 135