Skip to main content
rewrite in a form that will be more useful for me when I next look at this.
Source Link
ShreevatsaR
  • 41.6k
  • 8
  • 96
  • 135

For those arriving here through a search (say for "expected number of collisions"), theThe following approximation may be useful.

If there are $k$ people and $N$ possible birthdays (or in case of a hash table, $k$ items being hashed into $N$ buckets), then the expected number of collisions (to be precise, the expected number of people who/items that collide with at least one other)of the others is exactly (see Henry's answer or André Nicolas's answer) $$ \begin{align} & k \left(1 - \left(1-\frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{N} - \frac{k(k-1)(k-2)}{2N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{N}. \end{align}$$$$ \begin{align} & k \left(1 - \left(1-\frac1N\right)^{k-1}\right) \\ & = \frac{k(k-1)}{N} - \frac{k(k-1)(k-2)}{2N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{N}. \end{align}$$


Note that you need to be clear about whatThe above is one possible definition of "expected number of collisions" means. If there are $r$ bucketsbirthdays/birthdaysbuckets each with two itemspeople/peopleitems in them, the above expression gives count $2r$, as it counts each member of each pair (giving count $2r$). If instead you want to count the number of buckets/birthdays that have multiple people in them, then the answer is approximately $$ \approx \frac{k^2}{2N}.$$

This result can be got from the previous analysis from noting that the most common type of collisions will have 2 in each bucket (3-way and higher collisions will be statistically rare), so you just halve the count. Or, rigorously, you can do a similar analysis focusing on buckets/birthdays: the probability thatderived either $0$ or $1$ of the $k$ people have that birthday is $$ \left(1 - \frac1N\right)^k + k\frac1N\left(1 - \frac1N\right)^{k-1}$$ So the number of buckets with multiple values in them is $$ \begin{align} & N \left(1 - \left(1 - \frac1N\right)^k - k\frac1N\left(1 - \frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{2N} - \frac{k(k-1)(k-2)}{3N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{2N} \end{align}$$

  • from the previous analysis, by noting that to the first order the most common type of collision is to have 2 in a bucket (3-way and higher collisions will be statistically rare), so you just halve the count;

  • or, by doing a similar analysis focusing on birthdays/buckets: the probability that either $0$ or $1$ of the $k$ people have that particular birthday is $$ \left(1 - \frac1N\right)^k + k\frac1N\left(1 - \frac1N\right)^{k-1}$$ So the expected number of buckets with multiple values in them is $$ \begin{align} & N \left(1 - \left(1 - \frac1N\right)^k - k\frac1N\left(1 - \frac1N\right)^{k-1}\right) \\ & = \frac{k(k-1)}{2N} - \frac{k(k-1)(k-2)}{3N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{2N}. \end{align}$$

For those arriving here through a search (say for "expected number of collisions"), the following approximation may be useful.

If there are $k$ people and $N$ possible birthdays (or in case of a hash table, $k$ items being hashed into $N$ buckets), then the expected number of collisions (to be precise, the expected number of people who collide with at least one other) is exactly (see Henry's answer or André Nicolas's answer) $$ \begin{align} & k \left(1 - \left(1-\frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{N} - \frac{k(k-1)(k-2)}{2N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{N}. \end{align}$$


Note that you need to be clear about what "expected number of collisions" means. If there are $r$ buckets/birthdays each with two items/people in them, the above expression counts each member of each pair (giving count $2r$). If instead you want to count the number of buckets/birthdays that have multiple people in them, then the answer is approximately $$ \approx \frac{k^2}{2N}.$$

This can be got from the previous analysis from noting that the most common type of collisions will have 2 in each bucket (3-way and higher collisions will be statistically rare), so you just halve the count. Or, rigorously, you can do a similar analysis focusing on buckets/birthdays: the probability that either $0$ or $1$ of the $k$ people have that birthday is $$ \left(1 - \frac1N\right)^k + k\frac1N\left(1 - \frac1N\right)^{k-1}$$ So the number of buckets with multiple values in them is $$ \begin{align} & N \left(1 - \left(1 - \frac1N\right)^k - k\frac1N\left(1 - \frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{2N} - \frac{k(k-1)(k-2)}{3N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{2N} \end{align}$$

The following approximation may be useful.

If there are $k$ people and $N$ possible birthdays (or in case of a hash table, $k$ items being hashed into $N$ buckets), then the expected number of people/items that collide with at least one of the others is exactly (see Henry's answer or André Nicolas's answer) $$ \begin{align} & k \left(1 - \left(1-\frac1N\right)^{k-1}\right) \\ & = \frac{k(k-1)}{N} - \frac{k(k-1)(k-2)}{2N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{N}. \end{align}$$


The above is one possible definition of "expected number of collisions". If there are $r$ birthdays/buckets each with two people/items in them, the above expression gives count $2r$, as it counts each member of each pair. If instead you want to count the number of buckets/birthdays that have multiple people in them, then the answer is approximately $$ \approx \frac{k^2}{2N}.$$

This result can be derived either

  • from the previous analysis, by noting that to the first order the most common type of collision is to have 2 in a bucket (3-way and higher collisions will be statistically rare), so you just halve the count;

  • or, by doing a similar analysis focusing on birthdays/buckets: the probability that either $0$ or $1$ of the $k$ people have that particular birthday is $$ \left(1 - \frac1N\right)^k + k\frac1N\left(1 - \frac1N\right)^{k-1}$$ So the expected number of buckets with multiple values in them is $$ \begin{align} & N \left(1 - \left(1 - \frac1N\right)^k - k\frac1N\left(1 - \frac1N\right)^{k-1}\right) \\ & = \frac{k(k-1)}{2N} - \frac{k(k-1)(k-2)}{3N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{2N}. \end{align}$$

Source Link
ShreevatsaR
  • 41.6k
  • 8
  • 96
  • 135

For those arriving here through a search (say for "expected number of collisions"), the following approximation may be useful.

If there are $k$ people and $N$ possible birthdays (or in case of a hash table, $k$ items being hashed into $N$ buckets), then the expected number of collisions (to be precise, the expected number of people who collide with at least one other) is exactly (see Henry's answer or André Nicolas's answer) $$ \begin{align} & k \left(1 - \left(1-\frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{N} - \frac{k(k-1)(k-2)}{2N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{N}. \end{align}$$


Note that you need to be clear about what "expected number of collisions" means. If there are $r$ buckets/birthdays each with two items/people in them, the above expression counts each member of each pair (giving count $2r$). If instead you want to count the number of buckets/birthdays that have multiple people in them, then the answer is approximately $$ \approx \frac{k^2}{2N}.$$

This can be got from the previous analysis from noting that the most common type of collisions will have 2 in each bucket (3-way and higher collisions will be statistically rare), so you just halve the count. Or, rigorously, you can do a similar analysis focusing on buckets/birthdays: the probability that either $0$ or $1$ of the $k$ people have that birthday is $$ \left(1 - \frac1N\right)^k + k\frac1N\left(1 - \frac1N\right)^{k-1}$$ So the number of buckets with multiple values in them is $$ \begin{align} & N \left(1 - \left(1 - \frac1N\right)^k - k\frac1N\left(1 - \frac1N\right)^{k-1}\right) \\ & \approx \frac{k(k-1)}{2N} - \frac{k(k-1)(k-2)}{3N^2} + O\left(\frac1{N^3}\right) \\ & \approx \frac{k^2}{2N} \end{align}$$