32
$\begingroup$

I'm trying to solve the well known Coupon Collector's Problem by explicitly finding the probability distribution (so far all the methods I read involve using some sort of trick). However, I'm not having much luck getting anywhere as combinatorics is not something I'm particularly good at.

The Coupon Collector's Problem is stated as:

There are $m$ different kinds of coupons to be collected from boxes. Assuming each type of coupon is equally likely to be found per box, what's the expected amount of boxes one has to buy to collect all types of coupons?

What I'm attempting:

Let $N$ be the random variable associated with the number of boxes one has to buy to find all coupons. Then $P(N=n)=\frac{|A_n|}{|\Omega _n|}$, where $A_n$ is the set of all outcomes such that all types of coupons are observed in $n$ buys, and $\Omega _n$ is the set of all the possible outcomes in $n$ buys. I think $|\Omega _n| = m^n$, but I'm not even sure about that anymore, as all my attempts so far led to garbage probabilities that either diverged or didn't sum up to 1.

$\endgroup$
10
  • 1
    $\begingroup$ This is why the tricks. An expression for the probabilities can be found, but it is unattractive, and very difficult to work with for a calculation of the mean. $\endgroup$ Commented May 2, 2013 at 18:51
  • $\begingroup$ Still, I'm interested in finding it, even if it's a pointless excercise in combinatorics. Does it involve the inclusion-exclusion principle perhaps? $\endgroup$ Commented May 2, 2013 at 19:16
  • $\begingroup$ Yes, Inclusion/Exclusion can be used. $\endgroup$ Commented May 2, 2013 at 19:21
  • $\begingroup$ See also (unless you consider GF a trick) en.wikipedia.org/wiki/… $\endgroup$
    – leonbloy
    Commented May 2, 2013 at 19:25
  • 1
    $\begingroup$ This question has been answered later ("subsequent duplicates"?) for example at math.stackexchange.com/questions/693222/… and math.stackexchange.com/questions/963077/… giving the answer $P(N=n)=\dfrac{m!}{m^n}S_2(n-1,m-1)$ where $S_2$ represents Stirling numbers of the second kind. $\endgroup$
    – Henry
    Commented Oct 8, 2014 at 17:27

1 Answer 1

43
$\begingroup$

As Henry pointed out in a comment, the probability has been determined elsewhere as

$$ \def\stir#1#2{\left\{#1\atop#2\right\}} P(N=n)=\frac{m!}{m^n}\stir{n-1}{m-1}\;, $$

where

$$\stir nk=\frac1{k!}\sum_{j=0}^k(-1)^{k-j}\binom kjj^n$$

is a Stirling number of the second kind and counts the number of partitions of a set of $n$ labeled objects into $k$ non-empty unlabeled subsets.

To get the expected value, it's slightly more convenient to work with the probability

$$ P(N\gt n)=1-\frac{m!}{m^n}\stir nm\;, $$

which can be derived in much the same manner: There are $m^n$ sequences of length $n$; choose one of $\stir nm$ partitions into $m$ non-empty subsets and one of $m!$ assignments of the coupons types to the subsets.

Then

\begin{align} E[N]={}&\sum_{n=0}^\infty P(N\gt n)\\ ={}&\sum_{n=0}^\infty\left(1-\frac{m!}{m^n}\stir nm\right)\\ ={}&\sum_{n=0}^\infty\left(1-\frac{m!}{m^n}\frac1{m!}\sum_{j=0}^m(-1)^{m-j}\binom mjj^n\right)\\ ={}&\sum_{n=0}^\infty\frac1{m^n}\sum_{j=0}^{m-1}(-1)^{m-j+1}\binom mjj^n\\ ={}&\sum_{j=0}^{m-1}(-1)^{m-j+1}\binom mj\sum_{n=0}^\infty\frac{j^n}{m^n}\\ ={}&\sum_{j=1}^m(-1)^{j+1}\binom mj\sum_{n=0}^\infty\frac{(m-j)^n}{m^n}\\ ={}&\sum_{j=1}^m(-1)^{j+1}\binom mj\frac mj\\ ={}&-m\sum_{j=1}^m\int_0^{-1}\mathrm dq'\binom mjq'^{j-1}\\ ={}&-m\int_0^{-1}\mathrm dq'\sum_{j=1}^m\binom mjq'^{j-1}\\ ={}&-m\int_0^{-1}\mathrm dq'\frac{(1+q')^m-1}{q'}\\ ={}&-m\int_0^{-1}\mathrm dq'\frac{(1+q')^m-1}{(1+q')-1}\\ ={}&-m\int_0^{-1}\mathrm dq'\sum_{j=0}^{m-1}(-q')^j\\ ={}&-m\sum_{j=0}^{m-1}\int_0^{-1}\mathrm dq'(-q')^j\\ ={}&m\sum_{j=1}^m\frac1j\;. \end{align}

I'll leave it to you to decide whether this counts as "using some sort of trick". :-)

$\endgroup$
3
  • 20
    $\begingroup$ Thanks, this answer was worth the 2,5 year wait :) $\endgroup$ Commented Sep 28, 2015 at 18:20
  • 2
    $\begingroup$ This is proving useful to me now several years later! I am not familiar with the technique used when introducing the integral here? Especially clueless as to how you removed the summation on line 10, any suggestions on what to Google or look for when trying to understand this step? $\endgroup$ Commented May 16, 2018 at 13:31
  • 2
    $\begingroup$ @ZackAshman: Glad to read it's proving useful :-) Sorry, I haven't been on the site for a very long time and only now saw your question. The step where the integral is introduced uses the elementary definite integral $\int_0^{-1}\mathrm dq'q'^{j-1}=\left[\frac{q'^j}j\right]_0^{-1}=\frac{(-1)^j}j$. The summation is removed in line $10$ using the binomial expansion of $\left(1+q'\right)^m$ after multiplying by $q'$ and adding $1$ for the $j=0$ term. $\endgroup$
    – joriki
    Commented May 30, 2018 at 21:38

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .