3
$\begingroup$

Given $n$ balls, which are numbered from $1$ to $n$, and also $n$ boxes, which are also numbered from $1$ to $n$. Initially, $i$-th ball is placed at $i$-th box. Then we are doing the following process $k$ times:

  • take a random ball (independent from the box where it is placed)
  • place that ball into a random box.

We are interested in a number of boxes that will contain at least 1 ball after doing all operations.

How do we calculate all these values for all $n^{2k}$ possible variations (for $k$ steps we have n options to choose a ball and n options to place a ball) and find this sum?

Example: $n = 4$, $k = 2$; answer is: 760

  • 40 options will have 4 non-empty boxes
  • 168 options will have 3 non-empty boxes
  • 48 options will have 2 non-empty boxes

As an answer, I would expect some formula (represented in any way: i.e., recursive, iterative) that works in polynomial time.

$\endgroup$
4
  • $\begingroup$ Note that $28$ of your $40$ "four non-empty boxes" options are the starting position. $\endgroup$
    – Henry
    Commented May 15 at 11:51
  • $\begingroup$ Good luck with your polynomial time ambition: you could see this as a Markov chain with states equivalent to the partitions of $n$, and there are almost $\frac {1} {4n\sqrt3} \exp\left({\pi \sqrt {\frac{2n}{3}}}\right)$ of those. But you could then do a recursion which was linear in $k$ if not $n$. $\endgroup$
    – Henry
    Commented May 15 at 11:59
  • $\begingroup$ For $k\gg n$, I would have thought the sum equivalent to your $760$ would approach $n^{2k}\times n(1-(1-\frac1n)^n)$ asymptotically as $k$ increases. For $n=4, k=2$ this suggests $700$ rather than $760$, so not too far away even for such a low $k$. For $n=4, k=5$ this gives $2867200$ rather than the correct $2895232$, so getting much closer relatively. $\endgroup$
    – Henry
    Commented May 15 at 12:12
  • $\begingroup$ I got interested in the limiting behavior of this as $n$ and $k$ both go to infinity. I asked a new question: math.stackexchange.com/q/4918569/111594 $\endgroup$
    – ploosu2
    Commented May 18 at 10:57

1 Answer 1

3
$\begingroup$

Using linearity of expectation, the expected number of non-empty boxes is $n$ times the probability that a box (any particular box, they're all symmetric) is non-empty after $k$ iterations. Let's denote this expectation by $E_n(k)$. Notice: your number $760$ is $n^{2k}E_n(k)$ for $n=4, k=2$.

Let's fix a box $B$. Then the innards of the other boxes don't matter, just how many balls there are in the fixed box (the rest are in other boxes). Each step, it just matters does the picked ball come from $B$ or not, and do we end up putting it in $B$ or not. The probabilities of these can be calculated just from the amount of balls in $B$.

Therefore we can model this as Markov chain where state space is $\{0,1,\dots, n\}$. State $b$ represents how many balls there are currently in $B$. Each step we go like this:

  • Either we pick a ball from $B$ (probability $\frac{b}{n}$)
  • or from others (probability $\frac{n-b}{n}$)

and then

  • Put it in $B$ (probability $\frac{1}{n}$)
  • or put it in others (probability $\frac{n-1}{n}$)

These four outcomes combine to give the transitions from $b$:

  • to $b-1$ with probability $\frac{b(n-1)}{n^2}$
  • to $b$ with $1-\frac{bn+n-2b}{n^2}$
  • to $b+1$ with $\frac{n-b}{n^2}$

For example when $n=4$ the transition matrix looks like this:

$$ Q_4 =\left(\begin{array}{rrrrr} \frac{3}{4} & \frac{1}{4} & 0 & 0 & 0 \\ \frac{3}{16} & \frac{5}{8} & \frac{3}{16} & 0 & 0 \\ 0 & \frac{3}{8} & \frac{1}{2} & \frac{1}{8} & 0 \\ 0 & 0 & \frac{9}{16} & \frac{3}{8} & \frac{1}{16} \\ 0 & 0 & 0 & \frac{3}{4} & \frac{1}{4} \end{array}\right) $$

We start from $b=1$, so to get the probability that $B$ is non-empty after $k$ iterations, we want $1 - Q_n^k[1,0]$, i.e. one minus the $1,0$ -entry of the $k$th power of $Q_n$ (matrix indices beginning from $0$).

In the example:

$$ Q_4^2 = \left(\begin{array}{rrrrr} \frac{39}{64} & \frac{11}{32} & \frac{3}{64} & 0 & 0 \\ {\color{red} {\frac{33}{128}}} & \frac{65}{128} & \frac{27}{128} & \frac{3}{128} & 0 \\ \frac{9}{128} & \frac{27}{64} & \frac{25}{64} & \frac{7}{64} & \frac{1}{128} \\ 0 & \frac{27}{128} & \frac{63}{128} & \frac{33}{128} & \frac{5}{128} \\ 0 & 0 & \frac{27}{64} & \frac{15}{32} & \frac{7}{64} \end{array}\right) $$

so the answer is

$$ E_4(2) = 4 \cdot \left(1- \frac{33}{128} \right) = \frac{95}{32} = \frac{760}{4^{2\cdot 2}} $$

UPDATE

Since the transition matrix $Q_n$ has particularly nice eigenvalues (they appear to be $1, \frac{n-1}{n}, \frac{n-2}{n}, \dots, 0$ we can diagonalize it and get a closed formula involving powers of the eigenvalues. We have $$ E_4(k) = \frac{175}{64}+\frac{27}{2^{k+5}}+\frac{3}{2^{2k+3}} $$

and

$$ E_5(k) = \frac{5^{5}-2^{10}}{5^{4}}+\frac{128}{5^{3}}\left(\frac{3}{5}\right)^{k}+\frac{64}{5^{3}}\left(\frac{2}{5}\right)^{k}+\frac{12}{5^{3}}\left(\frac{1}{5}\right)^{k} $$ and (for $n=7$ since it's prime so if it's easier to spot a pattern there):

$$ E_7(k) = \frac{7^{7}-6^{7}}{7^{6}}+\frac{3\cdot6^{5}}{7^{5}}\left(\frac{5}{7}\right)^{k}+\frac{10\cdot6^{4}}{7^{5}}\left(\frac{4}{7}\right)^{k}+\frac{15\cdot6^{3}}{7^{5}}\left(\frac{3}{7}\right)^{k}+\frac{2\cdot6^{3}}{7^{5}}\left(\frac{2}{7}\right)^{k}+\frac{5\cdot6}{7^{5}}\left(\frac{1}{7}\right)^{k} $$

There certainly appears to be some pattern but what are those coefficient numbers?


UPDATE 2 Let's actually find the formula from the diagonalization. We have

$$ p_{n,k} := Q_n^k[0,1] = \mathbb{e_1}^TJD^kJ^{-1}\mathbb{e_0} $$

where we have the diagonalization of $Q_n = JDJ^{-1}$. The vector $v := \mathbb{e_1}^TJ$ is the eigenvector corresponding to the eigenvalue $\lambda = \frac{n-1}{n}$. It is $v = (1,0,-1,-2,\dots, -(n-1))$. (Need to check, but seemed obvious from cases.) The vector $w := J^{-1}\mathbb{e_0}$ is the left eigenvector corresponding to $\lambda = 1$ (it's also the steady state vector). The matrix $Q_n$ is tri-diagonal so we can solve $w$ easily iteratively. Set the first component to $\frac{(n-1)^n}{n^{n}}$ so that we're in "correct scale" with $v$ (again, needs checking but seems obvious) and find other components from the equations of $(Q_n^T-\mathbb{I})w = 0$.

Here's a Python program:

def E(n, k):
    w0 = (n-1)**n/n**n
    w1 = n/(n-1)*w0
    p = w0
    for j in range(1, n-1):
        a0  = n-j+1
        a1 = -n*(j+1)+2*j
        a2 = (j+1)*(n-1)
        w0, w1 = w1, -a0/a2*w0 - a1/a2*w1
        p -= j*w1*((n-1-j)/n)**k
    return n*(1-p)

UPDATE 3 My hypothesis is that $w_j = {n\choose j}\frac{(n-1)^{n-j}}{n^n}$, so we get the formula

$$ p_{n,k} = \left( \frac{n-1}{n} \right)^n \sum_{j=0}^{n-1} (1-j){n\choose j} \frac{1}{(n-1)^j}\left( \frac{n-j}{n} \right)^k. $$

$\endgroup$
5
  • $\begingroup$ The eigenvalues of the matrix $Q_n$ seem to be $\frac{b}{n}$ for $b=0,1,\dots, n$. Maybe with diagonalizing we can even get a simple closed formula... $\endgroup$
    – ploosu2
    Commented May 16 at 12:47
  • $\begingroup$ The formula looks like it might also "come from" inclusion exclusion, since the second term is negative. But it's different from the rest since it doesn't have the power $k$. If we somehow move the contributions to the next term and so on... $\endgroup$
    – ploosu2
    Commented May 16 at 13:53
  • $\begingroup$ I guess we don't need to derive such a good-looking formula. The key idea here is that we can fix some box and solve it independently of other boxes so to speak. Following recursive formula works just fine in desired $O(n*k)$ using dynamic programming: $ f(b, k) = (b + (n-b) * (n-1)) * f(b, k-1) + b * (n-1) * f(b-1, k-1) + (n-b) * f(b+1, k-1)$ $\endgroup$
    – LaVuna47
    Commented May 16 at 18:49
  • $\begingroup$ The answer is $ n * f(1, k) $ $\endgroup$
    – LaVuna47
    Commented May 16 at 18:54
  • 1
    $\begingroup$ @LaVuna47 Yes and if $k$ is very large and $n$ small, we can use fast matrix exponentiation (although if the exact answer is needed, the rational numbers will get very large denominatory, and if an approx suffices, then we're already very close to the steady state so just use that). But out of curiosity, I went and found the "closed formula". Well, the coefficients are still solved recursively in $n$ steps. $\endgroup$
    – ploosu2
    Commented May 17 at 8:56

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .