5
$\begingroup$

I am trying to solve a variation on the coupon collector's problem.

In this scenario, someone is selecting coupons at random with replacement from n different possible coupons. However, the person is not selecting coupons one at a time, but instead, in batches.

Here's an example problem formulation: There are 100 distinct coupons. A person makes selections in 10-coupon batches at random (each coupon with replacement). What is the expected number of batches necessary to have selected 80 unique coupons?

I have been able to determine the expected number of selections necessary to have selected k unique coupons when selecting one at a time (much like Henry's answer to a similar question), but I'm a bit stumped as to how to go about solving it with this particular wrinkle.

Any tips/guidance would be greatly appreciated.

$\endgroup$

2 Answers 2

5
$\begingroup$

The probability not to have all $n$ coupons after drawing $k$ coupons is

$$ \def\stir#1#2{\left\{#1\atop#2\right\}} P(K\gt k)=1-\frac{n!}{n^k}\stir kn $$

(see e.g. Probability distribution in the coupon collector's problem). Thus, if we draw in batches of $b$ coupons, the expected number of batches is

\begin{align} \sum_{j=0}^\infty P(K\gt jb) &= \sum_{j=0}^\infty\left(1-\frac{n!}{n^{jb}}\stir{jb}n\right) \\ &= \sum_{j=0}^\infty\left(1-\frac1{n^{jb}}\sum_{l=0}^n(-1)^{n-l}\binom nll^{jb}\right) \\ &= \sum_{j=0}^\infty\sum_{l=0}^{n-1}(-1)^{n-l+1}\binom nl\left(\frac ln\right)^{jb} \\ &= \sum_{l=0}^{n-1}(-1)^{n-l+1}\binom nl\frac1{1-\left(\frac ln\right)^b}\;. \end{align}

For instance, for $n=2$, this is

$$ \frac2{1-2^{-b}}-1=1+\frac2{2^b-1}\;. $$

$\endgroup$
3
  • 1
    $\begingroup$ P.S.: This duplicate question links to a table with values for $1\le k\le n\le5$ from simulations, which coincide with the values given by the formula derived here. $\endgroup$
    – joriki
    Commented Jun 18, 2016 at 0:13
  • $\begingroup$ Hi @joriki, I'm not so sure about this solution, it seems to clash with the answer you provided to here. Both end up giving very different answers. Can you clarify this? Is one of them incorrect? $\endgroup$ Commented May 16, 2018 at 14:43
  • $\begingroup$ @ZackAshman: Those are quite different problems. In that other problem, the number of coupons per batch is variable; what's given instead is that each coupon has an (independent) probability $p$ to be in any given batch. I don't see why there should be any correspondence between the answers to the two questions. $\endgroup$
    – joriki
    Commented May 30, 2018 at 22:47
3
$\begingroup$

Let $N_t$ denote the number of different coupons after the selection of $t$ batches of size $b$ drawn from $n$ different coupons and $T_k=\inf\{t\geqslant0\mid N_t\geqslant k\}$ (here, $b=10$, $n=100$ and $k=80$). One is interested in $$ \mathrm E_{b,n}(T_k)=\sum\limits_{t\geqslant0}\mathrm P_{b,n}(T_k\gt t)=\sum\limits_{t\geqslant0}\mathrm P_{b,n}(N_t\lt k). $$ The way to compute the value of $\mathrm E_{b,n}(T_k)$ is unclear, although the dynamics of $(N_t)_t$ is easy to describe.

To wit, $N_0=0$, $N_1=b$, and if $N_t=i$, $N_{t+1}=i+M_t$ where $0\leqslant M_t\leqslant b$ almost surely and the distribution of $M_t$ depends on $i$ and may be written down. For example, if $b=2$, $M_t=0$ with probability $\frac{i(i-1)}{n(n-1)}$, $M_t=1$ with probability $\frac{2i(n-i)}{n(n-1)}$ and, finally, $M_t=2$ with probability $\frac{(n-i)(n-i-1)}{n(n-1)}$. Thus, $\mathrm E_{2,n}(M_t\mid N_t)=2(1-N_t/n)$ and $\mathrm E_{2,n}(N_t)=n(1-(1-2/n)^t)$.

More generally, $\mathrm E_{b,n}(M_t\mid N_t)=b(1-N_t/n)$ and $\mathrm E_{b,n}(N_t)=n(1-(1-b/n)^t)$.

One might want to estimate $\mathrm E_{b,n}(T_k)$ by the root $t_{b,n}(k)$ of the equation $k=\mathrm E_{b,n}(N_t)$, that is, $$ t_{b,n}(k)=\frac{\log(1-k/n)}{\log(1-b/n)}, $$ for example, $t_{10,100}(80)\approx15.28$. However, the quality of this approximation (or rather, the range of parameters $(n,b,k)$ where it is passable) should be checked, numerically or otherwise.

$\endgroup$
2
  • $\begingroup$ Thank you for the thorough answer. Let me clarify one specification of the problem: There could be duplicates in each batch. That is, $N_1\leqslant b$ as collisions are possible after the first selection. The batch here acts only as a limit on how often one can check to see if the number of unique coupons selected has met or exceeded $k$. Any thoughts on that? $\endgroup$
    – rukh
    Commented Apr 14, 2012 at 20:42
  • $\begingroup$ After thinking about this with my comment above in mind, it seems like a good approximation would be taking the ceiling of the expected number of trials divided by batch size: $$ n \log_e \left(\frac{n}{n-p}\right) $$ divided by $b$. Thoughts? $\endgroup$
    – rukh
    Commented Apr 14, 2012 at 21:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .