6
$\begingroup$

I'm struggling to find where I made a mistake on the way to solving the following problem.

Problem description: A grocery sells chocolates bars. There are $5$ kinds of stickers. Each chocolate is sold with $1$ sticker of one of these $5$ kinds. The probability to find any kind of stickers in any chocolate bar is the same.

What's the probability of collecting all $5$ types of stickers if you buy $7$ chocolates at once?

My solution: since the order of chocolate bars that I've bought is irrelevant, the total number of 7-chocolate bar sets that I can buy equals

$$\left(\binom{5}{7}\right)=\binom{7+5-1}{7}=\binom{11}{7}=330$$

Now I'm solving the reverse problem: let's calculate the probability of failing to collect all $5$ kinds of stickers. To do that I need to calculate the number of 7-chocolate bar sets that have less than $5$ kinds of stickers, which means that I need sets with only $1$ kind of stickers, $2$, $3$ and $4$. Since, again, the order of chocolate bars doesn't matter, the number of such sets is

$$\left(\binom{4}{7}\right)=\binom{7+4-1}{7}=\binom{10}{7}=120$$

Finally, my answer to the initial problem should be

$P = 1 - \frac{120}{330}= 0.6363636364$

The answer key says it's $\approx 0.215$

Where's the flaw in my solution?

I appreciate any help.

$\endgroup$

5 Answers 5

13
$\begingroup$

The flaw in your solution is that the $330$ cases you enumerated are not equally likely to appear. For instance, there is only one way for all seven chocolate bars to show the first label. However, if we list the stickers in the order in which we look at the chocolate bars we collect, there are $\binom{7}{2}\binom{5}{2}3!$ ways to obtain a collection with two stickers of the first type, two stickers of the second type, and one of each of the other types.

There are five possible stickers for each of the seven chocolate bars, so there are $5^7$ ways to distribute stickers to chocolate bars.

For the favorable cases, we must subtract those distributions in which not all five kinds of stickers appear in the collection of purchased chocolate bars.

There are $\binom{5}{k}$ ways to exclude $k$ of the $5$ stickers and $(5 - k)^7$ ways to distribute stickers of the remaining $5 - k$ kinds to the chocolate bars. Hence, by the Inclusion-Exclusion Principle, the number of ways the stickers may be distributed to the seven chocolate bars so that all five kinds of stickers appear is $$\sum_{k = 0}^{5} (-1)^k\binom{5}{k}(5 - k)^7 = 5^7 - \binom{5}{1}4^7 + \binom{5}{2}3^7 - \binom{5}{3}2^7 + \binom{5}{4}1^7 - \binom{5}{5}0^7$$ Thus, the probability that all five kinds of stickers appear on the collection of seven chocolate bars is $$\frac{1}{5^7}\left[5^7 - \binom{5}{1}4^7 + \binom{5}{2}3^7 - \binom{5}{3}2^7 + \binom{5}{4}1^7 - \binom{5}{5}0^7\right]$$

$\endgroup$
4
  • $\begingroup$ Thank you very much, I will accept the answer, but could you please explain why the 330 cases are not equally likely to appear? Suppose that $P_k = 0.2$, where $k$ is the number of kind of sticker I buy. Since appearance of any kind of the next chocolate is independent from the previous purchases, isn't the probability of buying any case $0.2^7$? $\endgroup$ Commented Feb 13, 2020 at 11:04
  • 2
    $\begingroup$ What we are interested is how many stickers of each type appear. Therefore, the probability of obtaining seven stickers of type 1 is $0.2^7$, but the probability of obtaining two stickers of type 1, two stickers of type 2, and one of each of the other stickers is $\binom{7}{2, 2, 1, 1, 1}0.2^7 = \frac{7!}{2!2!1!1!1!}0.2^7$. For the latter case, think about how many sequences there are of two stickers of type 1, two stickers of type 2, and one sticker of each of the other types. $\endgroup$ Commented Feb 13, 2020 at 11:08
  • 4
    $\begingroup$ @alekscooper: The odds of rolling two dice and getting two sixes is 1/36 as there is one valid outcome (6+6) in 36 total outcomes, but the odds of rolling a 1 and a 6 (if the order doesn't matter) is 2/36 (= 1/18) because there are two valid outcomes (1+6, 6+1) in 36 total outcomes. Similarly, in your case, supposing we get exactly two red stickers, they could've come from bars 1+2, 1+3, 2+3, 1+4, ... whereas if we get exactly seven red stickers, there is only one way in which that could've happened (1+2+3+4+5+6+7) and therefore it is less likely to happen compared to getting exactly two. $\endgroup$
    – Flater
    Commented Feb 13, 2020 at 23:23
  • 2
    $\begingroup$ @alekscooper: If order does matter, then (5 red + 2 blue) is distinct from (2 blue + 5 red), and both are equally likely to occur than (7 red) or (7 blue) or any other outcome. But if order does not matter, then (5 red + 2 blue) and (2 blue + 5 red) are different possibilties which count as the same outcome, and therefore that outcome is represented in more ways than (7 red) or (7 blue), both of which only have one way to be represented. More representation = bigger odds. $\endgroup$
    – Flater
    Commented Feb 13, 2020 at 23:27
10
$\begingroup$

This is rather long, but correct solution. Your sample space are 7 chocolate bars with stickers attached to them. Probability of an elementary sequence is $(1/5)^7$. Let's count favorable sequences. If you have 5 different bars, you may have the following possibilities: 3 kinds of a specific sticker and 4 remaining unique stickers, or 2 kinds of 2 stickers plus 3 remaining unique stickers. Let's now count! There are 5 different stickers, ${7 \choose 3}$ ways to select positions for a sticker of the same kind, and 4! ways to arrange remaining unique stickers in alternative orders (remember that we are working with elementary sequences), giving the first summand: $$5{7 \choose 3}4! = 4200$$ There are also ${7 \choose 2}$ ways to select positions for a first 2-repeats sticker and ${5 \choose 2}$ ways to select positions for a second 2-repeats sticker. Now, once positions are selected, we have ${5 \choose 2}$ ways to select two stickers to fill these 2-repeat positions, and 3! ways to arrange remaining unique stickers. The second summand, therefore, is: $${7 \choose 2}{5 \choose 2}{5 \choose 2}3! = 12600$$ Finally, together we have: $$(1/5)^7(12600 + 4200) = 0.21504$$

$\endgroup$
4
$\begingroup$

I could not remember any of my combinatorial mathematics, so I did a brute force method. My apologies to the real mathematicians out there.

Let's name the stickers with lower case letters: a, b, c, d, e.

Each element in our sample set of equally likely outcomes is a 7-tuple of these letters.

  • Example: [abcdeab]

Order matters! [aaaaaab] is different than, but equally likely as, [baaaaaa] even though they contain the same number of each type of sticker. So there are five options for the first chocolate bar, five options for the second chocolate bar, etc.

That means there are 5*5*5*5*5*5*5 = 78125 equally likely outcomes.

How many of these outcomes contain all five stickers? Let's take one of each sticker and find out how many ways we can place them on five different chocolate bars.

  • I have seven options of where to place A.
  • I have six remaining options of where to place B.
  • I have five remaining options of where to place C.
  • I have four remaining options of where to place D.
  • I have three remaining options of where to place E.

That gives me 7*6*5*4*3 = 2520 combinations for placing 5 stickers. Let's call each option a "template".

  • Template Example: [dXcbaYe] where X and Y have not yet been given stickers.

I can put any sticker I want on X or Y, and as there are 5 sticker choices, I have 5*5 or 25 variants for each template.

At this point, I might be tempted to say that we have 2520 possible templates each with 25 variants for a total of 2520*25 = 63,000 favorable outcomes. But that would be wrong, because I am counting outcomes multiple times.

  • Example: These two templates, [abcdeXY] and [XXcdeab], can both produce the same variant [abcdeab].

So let's calculate duplication and eliminate it.

If X and Y get assigned the same sticker, then the outcome will appear as a variant of three separate templates. For example, the outcome [abcdeaa] is a variant of the following three templates:

  • [abcdeXX]
  • [XbcdeaX]
  • [XbcdeXa]

If X and Y get assigned different stickers, then the outcome will appear as a variant of four separate templates. For example, the outcome [abcdeab] is a variant of the following four templates:

  • [abcdeXY]
  • [XbcdeaY]
  • [aXcdeYb]
  • [XYcdeab]

So, I have 2520 possible templates. For each template, I have 25 variants, of which 5 (the XX variants - aa, bb, cc, dd, ee) will be counted three times each and the other 20 (the XY variants) will be counted four times each. So the total number of options is:

  • XX variant count - (2520 * 5 / 3) = 4200
  • XY variant count - (2520 * 20 / 4) = 12600
  • All variants = 4200 + 12600 = 16800

So the probability is 16800 / 78125 = 21.504%

$\endgroup$
1
  • $\begingroup$ Thank you, Phoeniceus Agelaius. I also did a brute force simulation, I used Python to verify the answer key and find my mistake. It turned out that I was wrong about my sample space and I should never (probably) use combinations with replacement as my sample space source. $\endgroup$ Commented Feb 21, 2020 at 5:24
1
$\begingroup$

In problems like this, if one wants to avoid the use of combinatorics (which often doesn't allow for checking whether the solution is right), Mathematica provides the possibility of using direct enumeration. In the present case, if one defines "stickers" as {"Q", "R", "S", "T", "U"}, then the total number of possible results is given by tot=Tuples[stickers, 7], while the favorable sequences are those which satisfy the condition that the length of their union is equal to 5 (so that they contain at least one of each sticker) fav=Select[Union /@ tot, Length[#] == 5 &]. Then Length[fav]/Length[tot]=0.21504.

Tomas Garza

$\endgroup$
1
  • $\begingroup$ I used Python to find the sample space and the favorable cases. $\endgroup$ Commented Feb 21, 2020 at 5:25
1
$\begingroup$

The real math problem here is: what is the chance of having a complete set of n items after buying m chances of 1/n chance of having any one of them, where n and m are positive integers, and m > n?

Because you buy a chance of chance, it seems like the formula has an exponential or factorial complexity.

What if there were only 2 tickets and 3 chocolate bars? Obviously, the first bar would give you a ticket you don't already have. The second bar may give the same one (50% chance) or a different one (50% chance) Case 1: probabilities for the third bar are the same as for the second bar Case 2: nothing happens since you already have the 2 different types of tickets To sum up: 50% * 50% = 25% chance to have only one ticket at the end So the answer is 1 - 25% = 75%, or 50% + 50% * 50% = 75%

This easier problem gives a little bit of insight of what's going on. It would be easier to draw every possible case on a sheet of paper and start guetting a grasp of what the formula is.

$\endgroup$
1
  • 1
    $\begingroup$ >>It would be easier to draw every possible case on a sheet of paper and start guetting a grasp of what the formula is. I wrote a Python simulation for that :) $\endgroup$ Commented Feb 23, 2020 at 6:52

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .