0
$\begingroup$

Suppose we use a hash function h to hash n keys into m slots. Assuming simple uniform hashing, what is the expected number of collisions? (CLRS, 3rd edition, problem 11.2-1)

My solution is as follows:

Let X be the number of collisions. For keys $k_i$ and $k_j$, we define the indicator random variable $X_{i,j} = I\{h(k_i) = h(k_j)\}$. Under the assumption of simple uniform hashing, we have Pr$\{h(k_i) = h(k_j)\} = 1/m$, so $E[X_{i,j}] = 1/m$. Thus, the expected number of collisions \begin{align} E[X] & = E[\sum\limits_{i=1}^n \sum\limits_{j=i+1}^n X_{i,j}] \\ & = \sum\limits_{i=1}^n \sum\limits_{j=i+1}^n E[X_{i,j}] \\ & = \frac{n^2-n}{2m} \end{align}
But there's another explanation:
Let Y be the number of empty slots, then $E[Y] = m(1-\frac{1}{m})^n$. Thus, \begin{align} E(\text{collisions}) & = n-E(\text{occupied locations}) \\ & = n-m+E[Y] \\ & = n-m+m(1-\frac{1}{m})^n \end{align} Which one is correct?

$\endgroup$
1
  • $\begingroup$ You have to define how you count collisions. Take the birthday problem and $4$ people born on New Year's Day. Is that $1$ collision (one birthday with multiple people) or $6$ collisions (six pairs of people with the same birthday) or $4$ collisions (four people sharing a birthday with somebody else)? See math.stackexchange.com/a/1366028/6460 $\endgroup$
    – Henry
    Commented May 10 at 1:42

0

You must log in to answer this question.