Balls are placed into 3 urns. Expected time until some urn has 100 balls.

Question

We have $3$ urns. At each round a ball is placed is placed into one of them, at random, with uniform probability. The game stops when some urn has $100$ balls.

What is expected duration of the game (number of rounds)?

Results from a simulation:

@quester I don´t know what the mean at the graph is. But it does not clearly matches my result where I used your formula. Do you think your formula is wrong? — callculus42, Commented Sep 24, 2019 at 17:32
@Sudix problem with formula is that there should be $\sum p = 1$, but unfortunately it is $\sum p < 1$ it should be summed up and then multiplied by inverse $\sum p = c => p_{new} = \frac{p}{c}$ then $EX = \sum x p_{new}$ — quester, Commented Sep 28, 2019 at 16:21
@sudix Those are the probabilities for the state (100,i,j). However, that does not mean that this 100 was reached in the last placement of the ball. You can reach the state (100,i,j) via (99,i,j) but also via (100,i-1,j) or (100,i,j-1). See in my answer where I compute the case that a an urn has 100 in it in the last ball placement by computing the probability that the urn has 99 in it (and no then you have 1/3 probability to reach 100 in the next step). — Sextus Empiricus, Commented Sep 28, 2019 at 21:16

Community · Accepted Answer · 2020-06-12 10:38:56Z

Computation of the distribution

Let $n$ be the number of balls to draw. Let $m$ be the number of urns. Let $k$ be the target number of balls when the game stops.

You could express the probability to reach a stop in $n$ balls in terms of the probability that the number of balls in each urn is $k-1$ or less (the cumulative distribution).

The number of ways to put $n$ balls into $m$ urns is $m^n$ (with or without reaching the stop condition).
The number of ways to put $n$ balls into $m$ urns but not having reached the stop condition (that is having at most $k-1$ in each of them) can be found by enumerating over the set $S$ of vectors $\vec{k}$ (the numbers $(k_i)$ depicting the the number of balls in each $i$-th urn) that satisfy the condition $$\sum_i k_i = n \quad \text{and} \quad \forall i:0 \leq k_i < k$$ And for each of the vector $\vec{k}$ (a set of numbers $k_1,k_2,k_3$) that satisfies these conditions we compute the the number of ways to distribute the balls into the urns with those numbers which is $$\text{number of ways to put $k_i$ balls in urn $i$} = \frac{n!}{\prod_i{k_i!}}$$ Then we take the sum over all of this $$P(N \leq n) = \frac{1}{m^n}\sum_{\vec{k} \in S} \frac{n!}{\prod_{k_i\in \vec{k}}{k_i!}} $$ where the sum over $\vec{k} \in S$ means the summation over all vectors with numbers $k_i$ that satisfy the conditions and the product over $k_i \in \vec{k}$ means the product with all $k_i$ in the $\vec{k}$.

See below for an implementation in R code:

# computation
n <- 99
sum <- rep(0,3*n+1)
for (k1 in 0:n) {
  for (k2 in 0:n) {
    for (k3 in 0:n) {
      t = (k1+k2+k3)
      sum[t+1] = sum[t+1]+exp(lfactorial(t)-lfactorial(k1)-lfactorial(k2)-lfactorial(k3))
    }
  }
}
x <- c(0:(3*n))
Xcum <- c(sum/3^x,0)

# simulation
set.seed(1)

draw <- function() {
  s <- sample(c(1:3),size = 300, replace=TRUE)
  min(which((cumsum(s==1)==100) | (cumsum(s==2)==100) | (cumsum(s==3)==100)))
}
q <- replicate(10^5,draw())

# computation using beta function

drn <- function(n,k) {
  a <- max(0,n-2*k+1)
  b <- min(k-1,n-k)
  choose(n-1,k-1) * 2^(n-k) / 3^(n-1) *
      ( zipfR::Ibeta(0.5,n-k-b+1,b+1)/beta(n-k-b+1,b+1) - 
        zipfR::Ibeta(0.5,n-(k-1)-(a-1),(a-1)+1)/beta(n-(k-1)-(a-1),(a-1)+1) )
  #choose(n-1,k-1) * 2^(n-k) / 3^(n-1) * (pbinom(b,n-k,0.5)-pbinom(a-1,n-k,0.5))
}
drn <- Vectorize(drn)


#plotting both together

h <- hist(q, breaks=c(0:298)+0.5, xlim=c(200,300),
          xlab = "N", ylab = "probability", freq = FALSE, main="")
lines(1:298,-diff(Xcum),col=2)
lines(c(100:298),drn(c(100:298),100),col=3)

multinomial distribution

You can view this as related to the multinomial distribution which has the pdf

$$\frac {n!}{k_1! k_2! ... k_m!} p_1^{k_1} p_2^{k_2} ... p_m^{k_m} $$

which becomes for equal $p_i = 1/m $ the following

$$\frac {1}{m^n}\frac {n!}{k_1! k_2! ... k_m!} $$

which shows similarity with the expresssion before. Then the probability that for $n$ draws you did not reach 100 yet is equal to the probability that in 100 draws each $k_i<100$. And you can see the computation of you probability density as the computation of the CDF for the multinomial distribution.

Expression in terms of regularized incomplete beta function

For the case of three urns we may write an explicit expression for the probability in terms of the regularized incomplete beta function.

The probability that there are in the $n$-th draw $k$ balls in the first urn and less than $k$ in the others is, is equal to the 1/3 of the probability that there are in the $n-1$ draw $l= k-1$ balls in the first urn and equal or less than $l$ in the others is :

$$\begin{array}{rcrl} P_{k_1=l=k-1,k_2 \leq l,k_3 \leq l \vert n-1} &=& &\sum_{a \leq k_2 \leq b} \frac {1}{3^{n-1}}\frac {(n-1)!}{l! k_2! (n-1-l-k_2)!} \\ & = & \frac{(n-1)!}{l! 3^{n-1}} &\sum_{a \leq k_2 \leq b} \frac {1}{k_2! (n-1-l-k_2)!} \\ & = & {{n-1}\choose{l}} \frac{2^{n-1-l}}{3^{n-1}}& \sum_{a \leq k_2 \leq b} \underbrace{{n-1-l\choose{k_2}} \frac{1}{2^{n-1-l}}}_{\text{this is a binomial distribution}} \\ & = & {{n-1}\choose{k-1}} \frac{2^{n-k}}{3^{n-1}} & \left( I_{1/2}(n-k-b+1,b+1) - I_{1/2}(n-k-a+2,a) \right) \end{array}$$

with $a = max(0,n-2k+1)$ and $b = min(k-1,n-k)$

Computation of the expectation value

In the first part we computed $P(n>k) = 1-P(n\leq k)$. To obtain the mean you can sum over all of these. $\mu = \sum 1-P(n\leq k)$. This will give:

$$\sum_{k_1=0}^{99}\sum_{k_2=0}^{99}\sum_{k_3=0}^{99} \frac{1}{3^{k_1+k_2+k_3}} \frac{(k_1+k_2+k_3)!}{k_1!k_2!k_3!} = 274.9186 $$

nice solution, specially in the first part. But $(k_1,k_2,k_3)$ is not a set, it is a triple (vector) — G Cab, Commented Sep 27, 2019 at 18:07

leonbloy · Accepted Answer · 2019-09-27 22:20:01Z

The expected time can be expressed in terms of the incomplete gamma function as follows (inspired by this paper and comments here):

In general: We want the expected value of the time to wait $T$ until one of the $3$ urns contains $n$ ($=100$) balls. Then

$$E_{n}[T] = \sum_{t=1}^\infty P(T\ge t) = \sum_{t=0}^\infty p_{n}(t) \tag1$$

where $p_{n}(t) $ is the probability that after $t$ rounds ($t$ balls) all $3$ urns have less than $n$ balls. But this is equivalent to

$$ \sum_{x=0}^{n-1}\sum_{y=0}^{n-1}\sum_{z=0}^{n-1} \frac{1}{3^{x+y+z}} \frac{(x+y+z)!}{x! \, y! \, z!} \tag2$$

Further, we use a property of the (upper) incomplete gamma function:

$$\begin{align} \left( \frac{\Gamma(n,a)}{\Gamma(n)} \right)^3 &= \left( e^{-a} \sum_{r=0}^{n-1}\frac{a^r}{r!} \right)^3 \\&= e^{-3a} \sum_{x=0}^{n-1}\sum_{y=0}^{n-1}\sum_{z=0}^{n-1}\frac{a^{x+y+z}}{x! \, y! \, z!} \tag3 \end{align}$$

Integrating and using $\int_0^\infty \exp(-3a) a^p da = p!/3^{p+1}$ we get

$$ \int_0^\infty \left( \frac{\Gamma(n,a)}{\Gamma(n)} \right)^3 da= \sum_{x=0}^{n-1}\sum_{y=0}^{n-1}\sum_{z=0}^{n-1} \frac{1}{3^{x+y+z+1}} \frac{(x+y+z)!}{x! \, y! \, z!} \tag4$$

and finally

$$E_{n}[T] = 3 \int_0^\infty \left( \frac{\Gamma(n,a)}{\Gamma(n)} \right)^3 da \tag5$$

More in general, if we have $d$ urns:

$$E_{n,d}[T] = d \int_0^\infty \left( \frac{\Gamma(n,a)}{\Gamma(n)} \right)^d da \tag6$$

This can be evaluated numerically, I don't know about asymptotics (asked here).

Empirically , it seems that $E = 3 n - \beta \sqrt{n} +O(1)$ where $\beta \approx 2.5$

And here's a numerical recursive computation (in Java):

public class MSE3368225 {

    static Double[] cache = new Double[(1<<21)];

    static double ex(int x, int y, int z) {
        if (x == 0 || y == 0 || z == 0)
            return 0;
        if (x > 127 || y > 127 || z > 127) 
            throw new RuntimeException("Out of range");
        int k = (x << 14) | (y << 7) | z; // packs three variables in one integer
        Double d = cache[k];
        if (d == null) {
            d = 1 + (ex(x - 1, y, z) + ex(x, y - 1, z) + ex(x, y, z - 1)) / 3.0;
            cache[k] = d;
        }
        return d;
    }

    public static void main(String[] args) {
        System.out.println(ex(100, 100, 100));
    }
}

This solves the recursion

$$g(x,y,z)=\begin{cases} 0 & \text {if $x=0$ or $y=0$ or $z=0$}\\ 1+ \frac13\left(g(x-1,y,z)+g(x,y-1,z)+g(x,y,z-1)\right) & \text{elsewhere} \end{cases} $$

where $g(x,y,z)$ is the expected remaining time, when there remains $(x,y,z)$ balls for each urn.

The result is $E_{100}[T]=274.9186440$

Some values

  n     E
  2  2.888889 
  3  5.049383 
  4  7.348270 
  5  9.734204
 10  22.34468
 20  48.99126
 50  132.3676
100  274.9186

noticed that if it would be optimized like (x,y,z) always would be sorted then number of steps would be greatly reduced — quester, Commented Sep 25, 2019 at 12:15
You mean how I evaluated it or how I got it? If the former: numerically - and I checked that it agreed with the code above. — leonbloy, Commented Sep 27, 2019 at 18:07
I mean how did you figured out this formula with integral and fraction with gamma functions, I wish I worked more with gamma functions and I would like to develop intuition behind gamma functions in probability theory — quester, Commented Sep 28, 2019 at 8:25

Stack Exchange Network

Balls are placed into 3 urns. Expected time until some urn has 100 balls.

2 Answers 2

Computation of the distribution

multinomial distribution

Expression in terms of regularized incomplete beta function

Computation of the expectation value

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
probability
statistics
numerical-methods
.

Linked

Hot Network Questions

Balls are placed into 3 urns. Expected time until some urn has 100 balls.

2 Answers 2

Computation of the distribution

multinomial distribution

Expression in terms of regularized incomplete beta function

Computation of the expectation value

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged probabilitystatisticsnumerical-methods.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
statistics
numerical-methods
.