3
$\begingroup$

I’m working on a chemistry problem, which essentially translates to finding the answer to a related probability problem. However, my knowledge in probability is very limited and I'd be grateful if someone could help me out with it. The following is the problem:-

Suppose I have a bag containing $70$ red balls and $30$ blue balls. For the purpose of illustration, let’s call them $R$s (red balls) and $B$s (blue balls). Now, I am going to pick one ball at a time from this bag, without replacement. I define a run to be a sequence of consecutive $R$s (or alternately, $B$s) picked, along with the first $B$ (or $R$) that is picked. And I define a red (or blue) run length to be the number of consecutive $R$s (or $B$s) I pick in a run, before I encounter a $B$ (or $R$) or until the number of balls run out.

As examples, $RRRRRRB$ is a run (for simplicity, let me denote it by $R_6$ in shorthand) with red run length $6$, $RB$ is a run (denoted by $R_1$) with red run length $1$, $BBBR$ is a run (denoted by $B_3$) with blue run length $3$.

In each simulation, I keep doing runs until all the $100$ balls are picked out (since the balls are picked without replacement, the number of runs and the red/blue run lengths are both finite).

Let’s look at a typical simulation of ball-picking: $R_{50} R_{10} B_{28} R_9$. In this simulation, there are $4$ runs. The first run consists of $50$ consecutive red balls, until a blue ball is encountered. The second run consists of $10$ consecutive red balls until a ball is encountered. The third run consists of $28$ consecutive blue balls until a red ball is encountered. And the last run consists of $9$ consecutive red balls, and the simulation ends as there are no more balls to be picked.

It is easy to see that the minimum possible number of runs is $2$ (attained by $R_{70}$ followed by $B_{29}$, or $B_{30}$ followed by $R_{69}$) and the maximum possible number of runs is $31$ (attained by $R_1$ $30$ times followed by $R_{40}$, or $B_1$ $30$ times followed by $R_{70}$).

Also, the maximum possible value of red run length is $70$ and that of blue run length is $30$.

Now, I’m interested in knowing the probability distribution of the red and blue run lengths. For this, I believe that I must first find the expected value of the number of runs in a simulation. But I’m not sure how to proceed from here. So to sum up, the following are my questions:-

$1$. How do I find the expected value of the number of runs in a simulation?

$2$. For that expected value, how do I calculate the probability distribution of red and blue run lengths?

Added: Thanks to Henry's great answer below, the above questions have mostly been answered, though I'd appreciate it if anyone has an answer that adds more to it. I'm interested in finding something else now;

Let's say there are $r$ red balls and $b$ blue balls. How do I find the probability that the length of the longest red (or blue) run is $M$, where $M \in \{1,2,...r\}$? Essentially, I'm trying to find the probability distribution of the lengths of the longest run.

I've read Schilling's classic MAA paper on the subject and tried to extend it, but this seems to be a much harder problem, as the probability of a red (or blue) ball keeps changing with the trial. Could someone help me out with this?

$\endgroup$
3
  • $\begingroup$ It might be easier if you slightly redefined your definition of a run so your example would be $R_{50}B_1R_{10}B_{29}R_{10}$ - then the sum of the subscripts would be a constant equal to the number of total balls. $\endgroup$
    – Henry
    Commented Nov 30, 2014 at 17:51
  • $\begingroup$ Your question 2 rather presupposes that the expected number of runs is an integer. $\endgroup$
    – Henry
    Commented Nov 30, 2014 at 17:53
  • $\begingroup$ @Henry: Thanks, your notation simplifies things. I'd use the floor function to obtain an integer from the expected number. $\endgroup$ Commented Nov 30, 2014 at 17:56

1 Answer 1

3
+250
$\begingroup$

All ${100 \choose 30}$ patterns are equally likely. There is probably a better approach to your first question, but this produces a nice answer:

Suppose that $R(r,b)$ is the expected number of runs given that the first ball is red and $B(r,b)$ is the expected number of runs given that the first ball is blue, in each case when there are $r$ red balls and $b$ blue balls. Then $R(r,0)=1$ and $B(0,b)=1$ and by symmetry $R(r,b)=B(b,r)$. More generally we have the recurrence $$R(r,b)=\frac{r-1}{r+b-1} R(r-1,b) + \frac{b}{r+b-1} (1+R(b,r-1))$$ which has the solution $R(r,b)= \dfrac{2rb+r-1}{r+b-1}$ with $R(1,0)=1$. So the expected number of runs overall when there are $r$ red balls and $b$ blue balls is $$\frac{r}{r+b}R(r,b)+\frac{b}{r+b}R(b,r)=1+\frac{2rb}{r+b}.$$

In your question you had $r=70$ and $b=30$, giving an expected number of runs of $43$.

It is slightly unfortunate that $43$ is odd, as it complicates the distribution of runs (e.g. there might be $21$ or $22$ red runs with probabilities $0.3$ and $0.7$).

Added: In general, if there $n$ balls of a particular colour and they are distributed between $k$ positive runs, then a simple stars and bars argument says that the probability that a given run is of length $M$ is $$\Pr(M=m|n,k)= \dfrac{\displaystyle {n-m-1 \choose k-2}}{\displaystyle {n-1 \choose k-1}}$$ the each of the $k$ runs has a identical (but not independent) distribution. So for example if you have $70$ red balls in $22$ runs or $30$ blue balls in $21$ runs, the probability distribution for the length $M$ in a red run or of a blue run would be about

m   red 70,22   blue 30,21
1   0.3043478   0.6896552
2   0.2148338   0.2216749
3   0.1507043   0.0656814
4   0.1050363   0.0176835
5   0.0727174   0.0042440
6   0.0499932   0.0008842
7   0.0341224   0.0001538
8   0.0231152   0.0000210
9   0.0155364   0.0000020
10  0.0103576   0.0000001
11  0.0068466   not possible
12  0.0044857   
13  0.0029118   
14  0.0018718   
15  0.0011912   
16  0.0007500   
17  0.0004670   
18  0.0002874   
19  0.0001747   
20  0.0001048   
21  (smaller)   

But we can go further than that, and see the distribution of runs unconstrained by how many runs there are. As an example consider two red balls and one blue ball: the patterns $R_2B_1$, $R_1B_1R_1$, and $B_1R_2$ are equally likely and so with repeated experiments over time you expect to see as many runs of $R_2$ as $R_1$ in this particular case. More generally, if there are $r$ balls and $b$ blue balls, the expected proportion of red runs which would be of length $m$ in that sense, is

$$ \dfrac{b}{r} \dfrac{ {r \choose m} }{ {r+b-1 \choose m}}$$

where reversing $r$ and $b$ would give the expected proportion of blue runs of length $m$ in that sense. With $r=70$ and $b=30$ this would give about

m   red runs    blue runs
1   0.3030303   0.7070707
2   0.2133581   0.2092352
3   0.1495706   0.0603978
4   0.1043878   0.0169869
5   0.0725221   0.0046490
6   0.0501482   0.0012364
7   0.0345106   0.0003191
8   0.0236323   0.0000798
9   0.0161011   0.0000193
10  0.0109130   0.0000045
11  0.0073571   0.0000010
12  0.0049326   0.0000002
13  0.0032884   (smaller)
14  0.0021795   
15  0.0014359   
16  0.0009402   
17  0.0006117   
18  0.0003954   
19  0.0002538   
20  0.0001618   
21  (smaller)   

which is not that far away from the constrained case.

$\endgroup$
16
  • $\begingroup$ Thank you for this great answer! :) Could you provide a reference for the above recurrence equation? Also, why do you say that all ${100 \choose 30}$ patterns are equally likely? Isn't it less likely to have a pattern like $B_{30}R_{70}$, because the number of blue balls are initially less than the number of red ones, and as the trials go on, only keep getting sparser? $\endgroup$ Commented Dec 2, 2014 at 3:18
  • $\begingroup$ Looking forward to the rest of your answer! :) $\endgroup$ Commented Dec 2, 2014 at 3:24
  • $\begingroup$ If you had two reds and one blue, the probability of BRR would be $\dfrac13 \times 1 \times 1$, the probability of RBR would be $\dfrac23 \times \dfrac12 \times 1$, and the probability of RRB would be $\dfrac23 \times \dfrac12 \times 1$, all the same. The probability of $B_{30}R_{70}$ is $\dfrac{30}{100}\times\dfrac{29}{99}\times\cdots\times\dfrac{1}{71}\times 1\times \cdots \times 1 = \dfrac{1}{100 \choose 30}$ as predicted $\endgroup$
    – Henry
    Commented Dec 2, 2014 at 7:10
  • $\begingroup$ I cannot provide a reference as I worked it out myself. It simply says that given the first ball is red then the second will either also be red (in which case the expected number of runs stays the same) or blue (in which case the expected number of runs goes up by $1$). $\endgroup$
    – Henry
    Commented Dec 2, 2014 at 7:26
  • $\begingroup$ I understand now, thank you. There's another question on my mind: From this question, what I'm primarily interested in knowing is the probability distribution of the runs and from that, know which run (of $R$s or $B$s) has the highest chance of occurring. That's why I thought of finding the expected number of runs first (so as to limit the length of the runs) and then finding the probability distribution of the runs pertaining to that. Is this approach correct? In short, what I'm asking is: Even though the expected number of runs is $43$, could there be longer runs with a higher probability? $\endgroup$ Commented Dec 2, 2014 at 8:21

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .