1
$\begingroup$

I've encountered an intriguing probability problem. I just registered to ask this, so this will be my first post.

Disclaimer: I met this problem in a real setting that's it too convoluted to explain (but I could if you want). I work in the humanities, so I wasn't sure if this is too trivial and should go into other kind of forums.

1. Short TL;DR version:

Let's imagine we have boxes containing unknown toys. There are 5 types of toy (A, B, C, D, E) and each box contains one at random.

There are 3 kids. Two of them always try to copy each other, so they want to have the same toy. The third kid wants to be different and have another toy. Additionally none of these 3 kids wants toy C.

I want a general formula to determine the probability of leaving the 3 kids with what they want after buying x boxes.

2. I already know the probabilities, but not the formula.

My first approach was to write this R Script.


    check_conditions <- function(row) {
  has_repeated_not_C <- any(duplicated(row) & row != "C")
  non_c_elements <- row[row != "C"]
  has_two_different_not_C <- length(unique(non_c_elements)) >= 2
  return(has_repeated_not_C & has_two_different_not_C)
}

boxes <- name <- function(x) {
  result <- expand.grid(replicate(x, LETTERS[1:5], simplify = FALSE))
  result <- result %>%
    mutate(Conditions = apply(result, 1, check_conditions))
  print(((sum(result$Conditions, na.rm = TRUE)) / nrow(result))*100)
}

The script lists all the permutations $5^x$, and then assess the contents to see if two conditions are met: if there is at least one duplicated toy that is not C, and, if there is at least one non-duplicated toy that is not C. Then it counts the number of permutations that meet the conditions and divide it over the total number of permutations $5^x$.

I still don't have a proper mathematical formula for any value of $x$, but I could use this script to compute some values for checking further attempts to find a formula:

  • $x=3$, 0.288
  • $x=4$, 0.5952
  • $x=5$, 0.8064
  • $x=6$, 0.918528
  • $x=7$, 0.9687552
  • $x=8$, 0.9887846
  • $x=9$, 0.9961513
  • $x=10$, 0.9987146

This takes around 6 minutes to run on my computer.

3. The "routes" approach.

I decided to explore an approach that involves representing the various possibilities as "routes" branching out as we select different items. Let's take $x=3$ as an example:

Routes:

A. Select a good item (4/5).

  • A.a. Select another different good item (3/5).
  • A.b. Repeat the first good item selected or repeat the second good item selected (2/5).

B. Select a good item (4/5).

  • B.a. Repeat the same good item (1/5).
  • B.b. Select a different good item (3/5).

Each "route" represents a unique combination of items that satisfies our conditions (two matching good toys and a different third good toy). Since the probabilities in each step of the routes are independent, we can multiply them to obtain the total probability for each route, and since we cannot change route once we followed one, those probabilities are not independent (and must be added as far as I remember from high school).

Thus, the total probability of selecting two matching good items and a different third good item when choosing $x=3$ items is the sum of the probabilities of both routes:

$P(A+B) = \left(\frac{4}{5} \cdot \frac{3}{5} \cdot \frac{2}{5}\right) + \left(\frac{4}{5} \cdot \frac{1}{5} \cdot \frac{3}{5}\right) = \frac{36}{125} \approx 0.288 $

Though this is clearly how it works, $x = 4$ would be way too long to do it.

4. I can't go any further.

I really think there must be a general formula for any value of $x$ for any arbitrary number of boxes. I can see that this last approach suggest a summation over a range that is a function of $x$, and that each term of the sum involves various fractions of 5 to the power of each value of the range of the summation. At some point I thought it was this:

$P = 1 - \sum_{i=0}^{\lfloor x/2 \rfloor} \left( \left(\frac{4}{5}\right)^i \cdot \left(\frac{x-i}{5}\right)^i \cdot \left(\frac{1}{5}\right)^{x-2i} \right)$

But after evaluation, it doesn't fit at all the results given by the script.

Please let me know your thoughts, ideas, or even alternative approaches. If you'd like more details on how I extended this approach to other values of $x$, I'll be more than happy to provide additional information, but this post is already too long.

$\endgroup$
2
  • $\begingroup$ Thank you for taking the time @trueblueanil. The conditions are the same regardless of the number of boxes: We need (i) two toys that are equal and are not C, and (ii) another toy that is neither C nor the previously duplicated toy. Notice that it won't matter if the third toy is repeated or not, as long as we can distribute them in a way that two kids get the same type of non-C toy and the other kid get a different non-C toy. The more boxes we buy, the more toys we will have, that's why the probability increases as $x$ gets larger. $\endgroup$
    – Nicolas
    Commented Jul 25, 2023 at 18:17
  • $\begingroup$ Thanks for clarifying. I had left as it was very late here, and I see that you have a satisfactory answer. Cheers ! $\endgroup$ Commented Jul 26, 2023 at 2:38

1 Answer 1

1
$\begingroup$

I will giva a solution for $x \geq 5$. For $x=3$ and $x=4$ you can do the computation by hand.

Lets compute the number of scenarios when kids end up being unhappy:

  1. They get only toys C and one of A, B, C or D (for example they get toy C four times and toy A two times). The number of possible ways to get only toys C and A is $2^n$, because the first toy is either A or C, the second also A or C... So you have 2 options for every box they open. In total that means $2^n$ options. Because you have combinations of C-A, C-B, C-D and C-E you multiply by 4, and subtract 3, because you can get all C-s 4 times. In total that amounts to $4\cdot 2^n-3=2^{n+2}-3$ scenarios.

  2. You get only toys C and 2 distinct toys. For picking two toys from the set$\{A,B,C,D\}$ you have $\binom{4}{2}$ options and then you have $n$ spots for the first toy and $n-1$ spots for the second toy. So all in all you have $\binom{4}{2}n(n-1)=6n(n-1)$ options of this kind

  3. You get only toys C and 3 distinct toys. For picking three toys from the set$\{A,B,C,D\}$ you have $\binom{4}{3} = 4$ options and then you have $n$ spots for the first toy, $n-1$ spots for the second toy and $n-2$ for the third. So all in all you have $\binom{4}{3}n(n-1)(n-2)= 4n(n-1)(n-2)$ options of this kind

  4. You get only toys C and 4 distinct toys. For picking four toys from the set$\{A,B,C,D\}$ you have $\binom{4}{4} = 1$ option and then you have $n$ spots for the first, $n-1$ spots for the second, $(n-2)$ for the third and $(n-3)$ spots for the last toy. So all in all you have $\binom{4}{4}n(n-1)(n-2)(n-3)=n(n-1)(n-2)(n-3)$ options of this kind

You have in total $2^{n+2}-3+6n(n-1)+4n(n-1)(n-2)+n(n-1)(n-2)(n-3)=2^{n+2}-3+n(n-1)(n^2-n+4)$ options.

Probability follows: $P =1-\frac{2^{n+2}-3+n(n-1)(n^2-n+4)}{5^n}$

$\endgroup$
1
  • $\begingroup$ Impressive! I've been juggling with this for days and you solved it in an instant. The explanation is very clear too. Thank you very much. Now I am wondering how is this different to the other answer, but this is exactly what I was trying to figure out. $\endgroup$
    – Nicolas
    Commented Jul 25, 2023 at 21:52

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .