Suppose we use a hash function h to hash n keys into m slots. Assuming simple uniform hashing, what is the expected number of collisions? (CLRS, 3rd edition, problem 11.2-1)
My solution is as follows:
Let X be the number of collisions. For keys $k_i$ and $k_j$, we define the indicator random variable $X_{i,j} = I\{h(k_i) = h(k_j)\}$. Under the assumption of simple uniform hashing, we have Pr$\{h(k_i) = h(k_j)\} = 1/m$, so $E[X_{i,j}] = 1/m$. Thus, the expected number of collisions
\begin{align}
E[X] & = E[\sum\limits_{i=1}^n \sum\limits_{j=i+1}^n X_{i,j}] \\
& = \sum\limits_{i=1}^n \sum\limits_{j=i+1}^n E[X_{i,j}] \\
& = \frac{n^2-n}{2m}
\end{align}
But there's another explanation:
Let Y be the number of empty slots, then $E[Y] = m(1-\frac{1}{m})^n$.
Thus,
\begin{align}
E(\text{collisions}) & = n-E(\text{occupied locations}) \\
& = n-m+E[Y] \\
& = n-m+m(1-\frac{1}{m})^n
\end{align}
Which one is correct?