7
$\begingroup$

My son is collecting Panini stickers from one of their football albums, where there are 472 stickers in total, and you can buy them in packs of 5 (no duplicates within those 5). You can also buy any 50 as a one-off from Panini, which you obviously want to do right at the end, for the final 50.

I believe this is the coupon collector's problem with multiple coupons being drawn at once. A paper analyses this problem and proves a probability distribution for it. The paper is:

"The collector’s problem with group drawings", Wolfgang Stadje, Advances in Applied Probability, Vol 22, No 4, Dec 1990. (Not open access, unfortunately.)

From the paper, $S$ is the set of all stickers, $A$ is the set of interest (where $A\subset S$), $l = |A|$, $s = |S|$, and we draw, with replacement, subsets $\omega_1, \omega_2, \ldots$ from $S$, each containing $m$ different stickers. Each $\omega \subset S$ has an equal probability of being chosen. Then $X_k(A)$ is the number of distinct elements of $A$ contained in the sets $\omega_1, \ldots, \omega_k$ and we have the following probability distribution:

$$ P(X_k(A) = n) = {l \choose n}\sum_{j=0}^n (-1)^j {n \choose j} \left[{s + n - l - j \choose m} \bigg/ {s \choose m}\right]^k \quad\quad n = 0, 1, \ldots, l $$

In my case, $s=472$, $l=n=422$ and $m=5$. I'm looking at how the distribution changes as more packs of stickers are bought. However, the probability doesn't monotonically increase as $k$ does.

It's clearer to see (and work out) with smaller values, so for $s=l=3$, $n=2$, $m=1$ the probabilities are 0, 2/3, 2/3, 14/27, 10/27 for $k=1,\ldots,5$. Can anyone tell me what I'm doing or interpreting wrongly here, or why the probability is decreasing with more packs, when intuitively it should tend towards 1.

For reference, there's another post I found that also deals with this equation, but in that they are considering $s=l=n$ which is not the case here.

$\endgroup$
3
  • 1
    $\begingroup$ Could you clarify your intuition? I don't see how having every $P(X_k(A)=n)\to 1$ as $k\to\infty$ could possibly be consistent with the laws of probability, which insist that the sum of all these probabilities (over all $n$) equal $1$. $\endgroup$
    – whuber
    Commented Feb 27, 2016 at 20:56
  • $\begingroup$ Well, it seems to me that the more packs you draw (i.e. increasing $k$) then the greater the likelihood that you will obtain $n$ stickers. I'm not talking about $P(X_k(A) = n)$ for all $n$, just for a single value of $n$. So, if we fix all variables apart from $k$, as $k \to \infty$ then surely the probability $P(X_k(A) = n)$ should also increase? $\endgroup$ Commented Feb 27, 2016 at 21:21
  • 1
    $\begingroup$ Now I'm thinking further and I guess I understand where I've gone wrong. Basically, as $k \to \infty$ then the chance of getting exactly $n$ stickers will go down for everything, apart from when $n = l$, which will increase. What I've done is to misinterpret $P(X_k(A) = n)$ as being at least $n$, instead of exactly $n$. Thanks @whuber for your comment which set me thinking properly along the right lines, if indeed I am now right! $\endgroup$ Commented Feb 27, 2016 at 21:35

1 Answer 1

1
$\begingroup$

While late to the game I believe I understand the issue you are having. The case is not, as the comments suggest, because the equation is for exactly the number of stickers, but rather due to the nature of your particular example where $n < l$.

If we re-read Wolfgang Stadie's paper where this equation is drawn from, $X_k (A)$ is defined as, "the number of distinct elements of A which are contained in at least one of the (packs drawn)".

So when $n$ (the number of distinct stickers we want) is equal to $l$ (the total number of stickers available), the equation will behaviour exactly as you are expecting. As $k$ (the number of packs we buy) increases, the chances of completing the set heads towards $1$.

However in your case $l=3$ and $n=2$. This means that there are three distinct stickers in the entire set, and you are seeking the probability of obtaining 2 of them ($P(X_k (A) = 2$).

As such, of course you would expect your probability to tend to $0$ as $k$ increases. With every pack you buy you are increasing the chance of drawing that third sticker, which would complete your set and invalidate the result you were after.

The results you have provided show that your odds of only ever having two of the three stickers in the set are best after buying only one or two packs of stickers, where your probability is $2/3$. After that your probability will tend to $0$. Hope that helps, even if it's two years late!

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.