7
$\begingroup$

Suppose that there is a certain collected works of plays that is N symbols long in the following sense: a "symbol" is one of the 26 letters of the alphabet, a line break, period, space, or a colon; in other words there are 30 possible symbols.

If "a monkey" randomly "types" 1 of these 30 symbols at a rate of one per second, how long will it take M monkeys working at this rate, on average, for one of them to randomly write this specific N symbol long collected works?

For clarity let me state that I am assuming each monkey ceaselessly types random symbols at this rate, and unless a monkey immediately types the right things, the collected works will be preceded by gibberish.

$\endgroup$
1

2 Answers 2

9
$\begingroup$

Unfortunately, an exact answer will depend on the specific sequence of $N$ symbols. To see why, take a simple example in which you only have two possible symbols, A and B, with $N = 2$, and only one monkey. Compare the expected times until the monkey types the sequence AA vs. the sequence BA. Unless the monkey types AA at the very beginning (with probability $\frac{1}{4}$), the first time AA appears the BA sequence must have already occurred. So the latter sequence will have the shorter expected time.

There are lots of these sorts of counterintuitive results about the likelihood of seeing a certain sequence before another certain sequence and about the expected time until a sequence is first seen. For more information, see MathWorld's entry on Coin Tossing, or, for even more information, the article "Penney Ante" that appeared in the UMAP Journal.

Now, if $N$ is large and the symbols are assumed to be evenly distributed in the target sequence, maybe you can avoid the problems with the two-symbol and unevenly distributed example I gave here. Or, if you force each monkey to stop after typing exactly $N$ symbols, check to see if they have typed the target sequence, and then have them start over again from scratch if they have not, then you can definitely avoid these problems. (For more on the latter, see this article "Infinite Monkey Business".)

$\endgroup$
5
$\begingroup$

We will estimate the probability for a "generic" string. The number of occurrences of the string in any given monkey's output is roughly distributed Poisson with $\lambda = 30^{-N}$. The time until the first event happens is thus roughly distributed exponentially with rate $\lambda = 30^{-N}$. The minimum of $M$ such processes is also distributed exponentially with rate $\lambda = M/30^N$. Thus the expected time is roughly $30^N/M$.

The same estimate can be obtained if we calculate the expected number of appearances. The expected number of appearances in any given monkey's stream for the first $t+M-1$ characters is $t/30^N$. For $M$ monkeys, it is $tM/30^N$. This is $1$ for $t = 30^N/M$, and this gives a rough estimate for the actual expectation.

In fact, assuming that the string "doesn't overlap with itself", we can get an exact expression for the expectation (depending only on $N$ and $M$) using Theorem 1.1 in "String overlaps, pattern matching, and nontransitive games" (Guibas and Odlyzko '81), which gives a generating function for the probability that a given monkey is not done after $t$ steps.

The paper also gives an expression for "non-generic" strings and for multiple strings, but the collected works are not going to overlap themselves; even if they do, it will probably have only a slight effect on the probabilities.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .