11
$\begingroup$

Given a set $S$ such that $|S|=n$,

A random item is chosen randomly from $S$, and being appended to a new set $T$.

This process is being repeated $n$ times (with repetition), what is the expected value of $|T|$ ?

$\endgroup$

4 Answers 4

5
$\begingroup$

Let $T_k$ be the number of elements in $T$ after $k$ rounds. We also set $T_0 = 0$.

Assuming that $T_k = l$, you have a probability of $\frac{l}{n}$ that the $k + 1$-th element is already in $T$ and a probability of $\frac{n - l}{n}$ that it isn't. This means:

$$E[T_{k + 1} \mid T_k = l] = \frac{l}{n} l + \frac{n - l}{n}(l+1) = l\left(1 - \frac{1}{n}\right) + 1$$

Now we can apply the law of total expectation: $$E[T_{k + 1}] = E[E[T_{k + 1} \mid T_k]] = E\left[T_k \left(1 - \frac{1}{n}\right) + 1\right] = E[T_k] \left(1 - \frac{1}{n}\right) + 1$$

This is a geometric progession and we can explicitly calculate: $$E[T_k] = \sum \limits_{i = 0}^{k - 1} \left(1 - \frac{1}{n}\right)^i = n\left(1 - \left(1 - \frac{1}{n}\right)^k\right)$$

Specifically for $k = n$ this yields $$E[T_n] = n\left(1 - \left(1 - \frac{1}{n}\right)^n\right) \approx n(1 - e^{-1}) \approx \frac{2}{3} n.$$

$\endgroup$
2
  • $\begingroup$ $n(1-\frac{1}{e})$ is a very good result that is also shown in my monte-carlo simulation, thank you ! $\endgroup$
    – Uri Goren
    Commented Sep 23, 2015 at 20:05
  • 1
    $\begingroup$ To be precise, $$n\left(1-\left(1-\frac1n\right)^n\right)\sim n(1-e^{-1})$$ where "$\sim$" means "asymptotically equal to". Then $$1-e^{-1}\approx\frac23$$ where "$\approx$" means "approximately equal to". $\endgroup$
    – robjohn
    Commented Sep 24, 2015 at 0:31
3
$\begingroup$

For $x\in S$ define $A_x$ to be the event that $x\in T$. Then $|T|=\sum_{x\in S} 1_{A_x}$ which has expected value $$\mathbb{E}(X)=\sum_{x\in S} \mathbb{P}(A_x)=n\left[1-\left(1-{1\over n}\right)^n\right].$$

$\endgroup$
2
  • 1
    $\begingroup$ (+1) This is certainly concise; however, it might not be quite clear why $\mathbb{P}(A_x)=1-\left(1-\frac1n\right)^n$. It might be useful to mention something like this: Since the probability that $x$ does not appear is $\left(1-\frac1n\right)^n$, its expected value is $1-\left(1-\frac1n\right)^n$. $\endgroup$
    – robjohn
    Commented Sep 24, 2015 at 3:12
  • $\begingroup$ @robjohn Thanks for adding this crucial piece of information. $\endgroup$
    – user940
    Commented Sep 24, 2015 at 3:17
3
$\begingroup$

This question can also be answered by total enumeration using Stirling numbers of the second kind. Observe that the expectation is given by

$$\mathrm{E}[X] = n^{-n} \sum_{k=0}^n k \times {n\choose k} k! {n\brace k}.$$

Here we multiply the value of the random variable by the probability, which is obtained from the fact that a sample with $k$ different entries needs a choice of these entries from the $n$ possibles and a partition of the $n$ slots into $k$ sets whose elements receive the same value. All $k!$ permutations of the $k$ chosen values result in a valid unique assignment.

We have $${n\brace k} = n! [z^n] \frac{(\exp(z)-1)^k}{k!}$$

by the species equation for set partititions $$\mathfrak{P}(\mathcal{U}\mathfrak{P}_{\ge 1}(\mathcal{Z}))$$

which yields for the expectation $$n^{-n} n! [z^n] \sum_{k=0}^n k \times {n\choose k} (\exp(z)-1)^k \\ = n^{-n} n! [z^n] n \sum_{k=1}^n {n-1\choose k-1} (\exp(z)-1)^k \\ = n^{-n} n! [z^n] n (\exp(z)-1) \sum_{k=0}^{n-1} {n-1\choose k} (\exp(z)-1)^k \\ = n^{-n} n! [z^n] n (\exp(z)-1) \exp((n-1)z) = n n^{-n} n! [z^n] (\exp(nz)-\exp((n-1)z)) \\ = n n^{-n} (n^n - (n-1)^n) \\ = n \left(1 - \left(1-\frac{1}{n}\right)^n\right) \sim n \left(1 - \frac{1}{e}\right).$$

Now we can also compute higher moments with this, the variance for example.

We first compute the expectation of $X(X-1)$, getting $$\mathrm{E}[X(X-1)] = n^{-n} \sum_{k=0}^n k (k-1) \times {n\choose k} k! {n\brace k} \\ = n^{-n} n! [z^n] n (n-1) \sum_{k=2}^n {n-2\choose k-2} (\exp(z)-1)^k \\ = n^{-n} n! [z^n] n (n-1) (\exp(z)-1)^2 \sum_{k=0}^{n-2} {n-2\choose k} (\exp(z)-1)^k \\ = n^{-n} n! [z^n] n (n-1) (\exp(z)-1)^2 \exp((n-2)z).$$

Extracting coefficients we obtain $$n (n-1) n^{-n} (n^n - 2(n-1)^n + (n-2)^n) = n(n-1) \left(1-2\left(1-\frac{1}{n}\right)^n +\left(1-\frac{2}{n}\right)^n\right) \\ \sim n(n-1) \left(1-\frac{2}{e}+\frac{1}{e^2}\right).$$

We thus get for the variance $$\mathrm{Var}[X] = \mathrm{E}[X^2] - \mathrm{E}[X]^2 \\ = n(n-1) \left(1-2\left(1-\frac{1}{n}\right)^n +\left(1-\frac{2}{n}\right)^n\right) \\ + n\left(1-\left(1-\frac{1}{n}\right)^n\right) - n^2\left(1-\left(1-\frac{1}{n}\right)^n\right)^2 \\ = n^2 \left(\left(1-\frac{2}{n}\right)^n - \left(1-\frac{1}{n}\right)^{2n}\right) \\ + n \left(\left(1-\frac{1}{n}\right)^n - \left(1-\frac{2}{n}\right)^n \right).$$

The asymptotic expansion then yields $$\mathrm{Var}[X] \sim n \left(\frac{1}{e}-\frac{2}{e^2}\right).$$

Addendum. We need the second term rather than the constant term in the asymptotic expansion of the factor on $n^2.$

We get for the first term in the difference $$\exp\left(n\log\left(1-\frac{2}{n}\right)\right) = \exp\left(-\sum_{q\ge 1} \frac{2^q}{n^{q-1} q}\right) \\ = \sum_{k\ge 0} \frac{(-1)^k}{k!} \left(\sum_{q\ge 1} \frac{2^q}{n^{q-1} q}\right)^k.$$

This yields for the constant term $$\sum_{k\ge 0} \frac{(-1)^k}{k!} 2^k = \frac{1}{e^2}.$$

We get for the next term $$\frac{1}{n} \sum_{k\ge 0} \frac{(-1)^k}{k!} {k\choose 1} 2\times 2^{k-1} = -\frac{2}{n} \sum_{k\ge 1} \frac{(-1)^{k-1}}{(k-1)!} 2^{k-1} = -\frac{2}{n} \frac{1}{e^2}.$$

For the second term in the difference we have $$\exp\left(2n\log\left(1-\frac{1}{n}\right)\right) = \exp\left(-2\sum_{q\ge 1} \frac{1}{n^{q-1} q}\right) \\ = \sum_{k\ge 0} \frac{(-2)^k}{k!} \left(\sum_{q\ge 1} \frac{1}{n^{q-1} q}\right)^k.$$

This time we get for the constant term $$\sum_{k\ge 0} \frac{(-2)^k}{k!} = \frac{1}{e^2}$$

and for the next term $$\frac{1}{n} \sum_{k\ge 0} \frac{(-2)^k}{k!} {k\choose 1} \frac{1}{2} \times 1^{k-1} = -\frac{1}{n} \sum_{k\ge 1} \frac{(-2)^{k-1}}{(k-1)!} = -\frac{1}{n} \frac{1}{e^2}.$$

The difference then yields for the variance $$n^2 \left(-\frac{1}{n} \frac{1}{e^2}\right) + n\left(\frac{1}{e}-\frac{1}{e^2}\right)$$ which is the desired result.

$\endgroup$
2
$\begingroup$

We can count the number of arrangements of $n$ items picked from a pool of $n$ items that have $d$ distinct items.

Let $S(i)$ be the collection of arrangements missing item $i$. Let $N(j)$ be the sum of the sizes of all intersections of $j$ of the $S(i)$.

Since we can choose $\binom{n}{j}$ items to leave out and for each of those choices there are $(n-j)^n$ arrangements of those items, we have $$ N(j)=\binom{n}{j}(n-j)^n\tag{1} $$ Since the number of arrangements with $d$ distinct items is the number of arrangements missing exactly $n-d$ items, according to the Generalized Inclusion-Exclusion Principle, the number of arrangements with exactly $d$ distinct items is $$ \sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}N(j) =\sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^n\tag{2} $$ To get the total number of arrangements, sum $(2)$ in $d$: $$ \begin{align} \sum_{d=0}^n\sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^n &=\sum_{j=0}^n\sum_{d=n-j}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^n\\ &=\sum_{j=0}^n[j=0]\binom{n}{j}(n-j)^n\\[9pt] &=n^n\tag{3} \end{align} $$ as expected.

To get the total number of arrangements times their size, multiply by $d$ and sum in $d$: $$ \begin{align} \sum_{d=0}^n\sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^nd &=\sum_{j=0}^n\sum_{d=n-j}^n(-1)^{j-n+d}\color{#00A000}{\binom{j}{n-d}}\binom{n}{j}(n-j)^n(\color{#C00000}{n}-\color{#00A000}{(n-d)})\\ &=\color{#C00000}{n\,n^n}-\sum_{j=0}^n\sum_{d=n-j}^n(-1)^{j-n+d}\color{#00A000}{\binom{j-1}{n-d-1}}\binom{n}{j}(n-j)^n\color{#00A000}{j}\\ &=n\,n^n-\sum_{j=0}^n[j=1]\binom{n}{j}(n-j)^nj\\[9pt] &=n\,n^n-n(n-1)^n\tag{4} \end{align} $$ Therefore, the expected number of distinct items picked would be $$ n\left[1-\left(1-\frac1n\right)^n\right]\sim n\left(1-\frac1e\right)\tag{5} $$

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .