5
$\begingroup$

CONTEXT

Hello everyone! I decided to make this post for a friend of mine, who is currently working on energy-related subjects. She had been asked to simulate how solar panels would react when being randomly damaged. She was able to complete this task, but wanted to make sure there was no mistake in her code, so she tried to compute the theorical probabilistic results. So, she shared her work with me and a group of friends by simplifying all the complicated physics that was hidden, leaving us with something that looked like a very classic school probability problem. The thing is, we all kinda suck at math today (we're finishing our engineering studies, so yeah, long time since we last used our brain for math stuff). This is where I need your help; there isn't any knowledge related to solar panels needed to help her, and we were hoping some of you could give an answer that would help computing the theorical probabilities. Here is the problem:

THE PROBLEM

We have a bag containing $r$ red balls, $g$ green balls and $b$ blue balls. We pick a ball at random in the bag, draw a line on it, and put it back in the bag. When a ball has a fixed number of $n$ lines drawned on it, we don't put it back in the bag but place it in a separated box. We're repeating this process until eventually the bag gets emptied, and all the balls are in our box.

Question: On average, how many picks do we have to make to have at least one ball of each color in our box?

DISCUSSION

It is interesting to note that this is actually a finite problem, and that no matter how we pick the balls it will take exactly $n(b+r+g)$ picks to have our box filled with all the balls. The pigeonhole principle also helps to grant a (slightly) better upper boundary for the average we're looking for. Let's suppose without any loss of generality that $g=\min \{r,g,b\}$; then we can be sure we have at least one ball of each color after $n(r+b)+g(n-1)+1$ picks, which represents the worst case scenario: we take out all red and blue balls, then draw $n-1$ lines on every green ball before finally taking out one of the green balls in the next pick. Sadly, no one in our group was able to find anything better than an upper bound. We had much trouble finding out what the right modelisation of random variables was, and could not handle the fact that the number of balls in the bag randomly change overtime.

So, does anyone have an idea how to solve this problem? We're actually trying to solve for precise values in our context, with $b=r=20$ and $g=n=10$ ; but it would be great if we could find a solution for the general case!

$\endgroup$

1 Answer 1

2
$\begingroup$

I suspect this is not easy analytically, though no doubt there are approximations.

Since you have actual values you are interested in, simulation will get you close. For example using R, you could try something like:

numberofpicks <- function(blueballs, redballs, greenballs, maxlines){       
  balls <- sum(blueballs, redballs, greenballs) 
  inbag <- rep(TRUE, balls)
  linesonball <- rep(0, balls)
  pickssofar <- 0
  colourinbox <- rep(FALSE, 3)
  while (! all(colourinbox)){
    pickssofar <- pickssofar + 1 
    pick <- sample((1:balls)[inbag], 1) 
    linesonball[pick] <- linesonball[pick] + 1
    if (linesonball[pick] >= maxlines){
       inbag[pick] <- FALSE
       if (pick <= blueballs){
          colourinbox[1] <- TRUE
          }else if (pick <= blueballs + redballs){
          colourinbox[2] <- TRUE
          }else{
          colourinbox[3] <- TRUE
          }
       }
    }
  return(pickssofar)
  }

and then running this $10000$ times:

set.seed(1) # for replication - try a different seed for different results
sims <- replicate(10^4, numberofpicks(blueballs = 20,
                        redballs = 20, greenballs = 10, maxlines = 10))
mean(sims)
# 308.0339
sd(sims)
# 36.80328
sd(sims) / sqrt(length(sims)) # standard error of mean
# 0.3680328 
summary(sims)
#Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 161     283     307     308     332     447 

suggesting an expectation in this case around $308 \pm 1$, though with a much wider distribution for individual runs. These $10000$ simulations ranged between $161$ and $447$, compared to your upper bound of $491$.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .