3
$\begingroup$

I'm looking into a particular variation of the coupon collector's problem where:

  • There are $n$ types of coupons
  • Each draw is a pack consisting of $k \leq n$ distinct coupons
    • Each coupon has an equal probability of appearing in a pack
    • Different packs may contain duplicate coupons
  • We are interested in $m \leq n$ specific coupons

The question is: what's the expected number of packs until we've obtained all $m$ specific coupons?


I'm not at all confident in my formulation of the problem above, but here's a concrete example that I'm thinking of:

Suppose there's a trading card game with $n=28$ different cards sold in packs of $k=5$ distinct cards, and we are only interested in collecting $m=10$ specific cards out of the 28.

  • This should differ from the more common variation I've found where we're interested in collecting any $m \leq n$ unique cards. In particular, we may have already obtained $n-1 > m$ unique cards while still missing the last specific card we want.
  • I've been trying to formulate this as obtaining the first $m$ cards (WLOG by re-labelling?), but got stuck with this approach. E.g. the original coupon collector has this trick of defining $t_i$ to be the number of draws to collect the $i^{th}$ coupon after having $i-1$ coupons and summing such $t_i$'s. So I've been trying to do something similar and defining $t_{i,j}$ to be the number of draws to collect $j<\min(k,m-i)$ of the remaining desired coupons after having $i<m$ desired coupons, but then they form a probability tree and things get messy and I needed more knowledge and technique.
$\endgroup$

1 Answer 1

1
$\begingroup$

By inclusion–exclusion, the probability that you’ve collected all $m$ desired cards after $t$ draws of $k$ distinct coupons is

$$ \mathsf P(T\le t)=\sum_{i=0}^m(-1)^i\binom mi\left(\frac{\binom{n-i}k}{\binom nk}\right)^t\;, $$

since there are $\binom mi$ ways to select $i$ missing cards and $\binom{n-i}k$ out of the possible $\binom nk$ packs miss all of them.

Thus the expected value of the number $T$ of draws required is

$$ \begin{eqnarray*} \mathsf E[T] &=& \sum_{t=0}^\infty\mathsf P(T\gt t) \\ &=& \sum_{t=0}^\infty\left(1-\mathsf P(T\le t)\right) \\ &=& \sum_{t=0}^\infty\sum_{i=1}^m(-1)^{i+1}\binom mi\left(\frac{\binom{n-i}k}{\binom nk}\right)^t \\ &=& \sum_{i=1}^m(-1)^{i+1}\binom mi\sum_{t=0}^\infty\left(\frac{\binom{n-i}k}{\binom nk}\right)^t \\ &=& \sum_{i=1}^m\frac{(-1)^{i+1}\binom mi}{1-\frac{\binom{n-i}k}{\binom nk}}\;. \end{eqnarray*} $$

For your example with $n=28$, $k=5$ and $m=10$ this comes out as

$$ \frac{11532680255947485468964}{739977234759737482325}\approx15.6 $$

(Wolfram|Alpha computation).

$\endgroup$
1
  • 1
    $\begingroup$ Oh wow, thank you very much for your answer! I ended up coding a simulation but having the analytical solution is really nice and it did line up giving me ~15.58 average over 1 million runs. I will have to carefully go through and try to understand your answer. Thank you so much :) $\endgroup$ Commented Dec 29, 2023 at 14:01

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .