1
$\begingroup$

I am looking for the solution to the Coupon Collector's Problem with subsets and packages - how many draws are necessary?

Preliminaries

To introduce my question I will give some preliminaries first.

Coupon Collector's Problem No. 1: Classical Variant

The classic Coupon Collector's Problem (CCP) can be described as follows: given an urn containing $n$ different balls, how often do we have to draw a ball with replacement in order to see each ball at least once? Each ball is equally likely. Let $X_i$ be the additional number of draws we have to make in order to get from $i-1$ different balls drawn to $i$ different balls drawn. Obviously, $X_0 = 0$. $X_1 = \frac{n}{n} = 1$ since we need to draw one ball to get a new one. $X_2 = \frac{n}{n-1}$, because the probability is $\frac{n-1}{n}$ to draw a new ball and hence geometric distributed. Therefore the expected value is $\frac{n}{n-1}$. Finally, to get the last ball, we need to draw $X_n = \frac{n}{1}$ balls in average. Let $X(n)$ be the number of draws we have to make to get each of the $n$ balls at least once. Then $X(n) := \sum_{i=1}^n X_i$ and \begin{align} E[X(n)] = n\left(\frac{1}{n}+\frac{1}{n-1}+\ldots+\frac{1}{2}+\frac{1}{1}\right) = nH_n = n \sum_{i=0}^{n-1} \frac{1}{n-i} \end{align} where $H_n$ is the $n$-th harmonic series and $E$ is the expected value. There are plenty of references for this result, for example this book on page 303.

Coupon Collector's Problem No. 2: Partial Collection

The CCP2 can be formulated as follows: given an urn containing $n$ different balls, how often do we have to draw a ball with replacement in order to see $1\leq k\leq n$ distinct balls at least once? Each ball is equally likely. We call this event $X_k(n)$. We can simply cut the summation of CCP1: \begin{align} E[X_k(n)] = n\left(\frac{1}{n}+\frac{1}{n-1}+\ldots+\frac{1}{n-(k-1)}\right) = n(H_n - H_{n-k}) = n \sum_{i=0}^{k-1} \frac{1}{n-i}. \end{align} See for example equation two in this paper for reference.

Coupon Collector's Problem No. 3: Coupon Packets

The CCP3 can be formulated as follows: given an urn containing $n$ different balls, how often do we have to draw subsets of $1 \leq s\leq n$ distinct balls in order to see each ball at least once? Each ball and each subset is equally likely. We denote this event by $X^s(n)$, and the answer is given on page eight of this paper as follows: \begin{align} E[X^s(n)] &= \sum_{i=1}^{n-s} (-1)^{i+1} \binom{n}{i} \frac{1}{1-\frac{\binom{n-i}{s}}{\binom{n}{s}}} + \sum_{i=1}^s (-1)^{n-s+1+i} \binom{n}{n-s+i}\\ &= \binom{n}{s} \sum_{i=1}^{n-s} (-1)^{i+1} \frac{\binom{n}{i}}{\binom{n}{s}-\binom{n-i}{s}} + \sum_{i=1}^s (-1)^{n-s+1+i} \binom{n}{n-s+i} \end{align} or as equation 7 in this paper: \begin{align} E[X^s(n)] = \binom{n}{s} \sum_{i=1}^{n} (-1)^{i+1} \frac{\binom{n}{i}}{\binom{n}{s}-{\binom{n-i}{s}}} . \end{align}

Question

The next problem instance describes my problem and the corresponding question.

Coupon Collector's Problem No. 4: Coupon Packets and Partial Collection

The CCP4 can be formulated as follows: given an urn containing $n$ different balls, how often do we have to draw subsets of $1 \leq s\leq n$ distinct balls in order to see $1\leq k\leq n$ distinct balls at least once? Each ball and each subset is equally likely. We denote this event by $X_k^s(n)$. \begin{align} E[X_k^s(n)] &= \mathrm{?}\\ &\approx \frac{1}{s} \sum_{i=0}^{k-1} \frac{n-(i\bmod s)}{n-i} \end{align} I tried various changes and theories to come up with a solution for the correct value of $E[X_k^s(n)]$, but it seems either the solution needs some tricks or I am just not able to see it right now. Therefore I wanted to know if somebody knows the solution for my problem. Since it seems to be a combination of CCP2 and CCP3, I guess there exists a quite elegant way. I only have an incorrect approximation right now but want to have the correct value.

I am thankful for any hints to references or solutions. Also, if there is no solution known yet, I would also appreciate this information.

$\endgroup$

1 Answer 1

0
$\begingroup$

You may refer to Theorem 1 of Kokza2007 "A Survey of the Coupon Collector’s Problem with Random Sample Sizes" for the solution of CCP4.

Fixed $s$ value is just a special case of the theorem, and the transition probability between states $i$ and $j$, $p_{ij}$ follows the hypergeometric distribution, where the states $i$, $j$ denote the numbers of distinct coupons you observe before and after a draw, respectively. Given $p_{ij}$, you may then construct a transition matrix of a Markov chain, which is absorbing at state $i=n$. The solution to your CCP3 is the expected number of steps that the Markov chain is absorbed, and $E[X_k^s(n)]$ in CCP4 is the expected number of steps to reach state $k$ from state $0$, which, just like $E[X_n^s(n)]$, can be calculated making use of the fundamental matrix of the absorbing Markov chain.

By the way, there is a Matlab function hygepdf() to obtain $p_{ij}$.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .