1
$\begingroup$

This question comes from Stack Overflow. I feel that we need more of a mathematical breakthrough, so I forward the question here.

I also found a similar problem that seems to be a special case of this problem.

In short, I want to shuffle a known string. This string will have any number of repeated letters. However, I must not allow the same character to appear n consecutive times in the shuffled string. For example, for the string "aaaabbbcc" and n=2, "aabaabbcc" is a correct permutation, but "aaabbbcca" is not because a letter appears more than twice in a row.

The question is: How to design this shuffling algorithm to ensure that each valid string has the same probability of occurrence?

A naive idea is to review the shuffled results and reshuffle them if they don't meet the requirements. However, for some extreme cases, performance may be extremely low. For example, "aaaabbbbbbbbbb" and n=2. In this case, only "bbabbabbabbabbabb" is the correct output, and all other randomly generated sequences are discarded. In this way, the function will continue to loop until the unique correct string is generated with a very small probability.

Another idea is to generate all possible permutations, then eliminate those that are wrong and draw randomly from the remaining correct permutations. But the complexity of this algorithm will be factorial, and the number of possible permutations will soon exceed the upper limit of 64-bit unsigned integer.

The final idea is to make a mathematical modification of Fisher-Yates: start with a random character, and then multiply the probability of the previous character being drawn by an attenuation coefficient each time the next character is drawn. For example, if the previous character has been consecutively drawn for n times, the attenuation coefficient should become 0 to ensure that the next character cannot be the same as the previous one. However, I do not know how to calculate this attenuation coefficient exactly to ensure that all the correct permutations are generated with equal probability.

$\endgroup$
9
  • $\begingroup$ This seems hard. Already it's not so easy to count the number of valid shuffles (although it is doable), and designing an algorithm that generates them with equal probability seems strictly harder to me. $\endgroup$ Commented Sep 20, 2023 at 3:12
  • 1
    $\begingroup$ How about this: Suppose we have already picked characters for positions 1,..,i-1. Then to pick the character for the i-th position, consider each of the unique remaining characters $a_1,...,a_k$ in the "pool" of characters that haven't been used yet (which has repetitions). Count the number of appropriate strings that we could make using that pool if we used that character for the i-th character (pretty sure we can come up with an expression for this). Pick character $a_j$ with probability $count(a_j)/(count(a_1)+...+count(a_k))$. $\endgroup$
    – Scott Hahn
    Commented Sep 20, 2023 at 3:34
  • $\begingroup$ "Another idea is to generate all possible permutations, then eliminate those that are wrong and draw randomly from the remaining correct permutations. But the complexity of this algorithm will be factorial, and the number of possible permutations will soon exceed the upper limit of 64-bit unsigned integer." Easily remedied by alternative programming method. Java (for example) has the BigInteger class, and I am certain that Python and C (for examples) have similar facilities, for just such a purpose as this. ...see next comment $\endgroup$ Commented Sep 20, 2023 at 4:43
  • $\begingroup$ $~2^{64} \approx 10^{22},~$ and the BigInteger class can routinely handle integers as big as $~10^{100}~$ on a vanilla home pc (say with 16gb of memory, and as many other processes temporarily shut down as is feasible). Further, although unverified, I suspect that the BigInteger class on a home pc can actually handle integers as large as $~10^{10000},~$ and I suspect that this applies to C and Python, as well. $\endgroup$ Commented Sep 20, 2023 at 4:46
  • $\begingroup$ @ScottHahn Your proposal has already had some discussion on StackOverflow. Unfortunately, we have not been able to find an efficient way to calculate the "number of appropriate strings." $\endgroup$ Commented Sep 20, 2023 at 9:02

1 Answer 1

1
$\begingroup$

Well, although I am not very well understand your question, I can help you to count the number of valid strings obeying the given rule with the given number of words.

I will use Smirnov words, you can find about this method in Analytic Combinatorics books(for example in this book)

Now, I assume that you learned this method, or just interested in its algorithm, so:

Lets assume that it is given $m$ $a's$, $n$ $b's$ and $k$ $c's$, we want to count the number of strings such that there can be at most

  • $x$ consecutive $a's$

  • $y$ consecutive $b's$

  • $z$ consecutive $c's$.

So, the number of valid strings is the coefficient of $[a^mb^nc^k]$ in the expansion of $$\bigg[1-\bigg(\frac{a+a^2+...+a^x}{1+a+a^2+...+a^x}+\frac{b+b^2+...+b^y}{1+b+b^2+...+b^y}+\frac{c+c^2+...+c^z}{1+c+c^2+...+c^z}\bigg)\bigg]^{-1}$$

I will use your example "aaaabbbcc" and repetation is at most $2$ for each letters to illustrate my method. If we have $4a's$,$3b's$,$2c's$, and we do not want any more than two consecutive same letter, then find the coefficient of $[a^4b^3c^2]$ in the expansion of $$\bigg[1-\bigg(\frac{a+a^2}{1+a+a^2}+\frac{b+b^2}{1+b+b^2}+\frac{c+c^2}{1+c+c^2}\bigg)\bigg]^{-1}$$

$\endgroup$
5
  • $\begingroup$ This is crazily cool. Let me think about it. $\endgroup$ Commented Oct 5, 2023 at 0:46
  • $\begingroup$ The expansion seems to be a fraction but not a polynomial. How can I determine the coefficient? $\endgroup$ Commented Oct 5, 2023 at 6:54
  • $\begingroup$ @埃博拉酱 when you expand it using softwares such as sage , it turns out to be polynomial $\endgroup$ Commented Oct 5, 2023 at 11:19
  • $\begingroup$ It can't be? According to the example fraction you gave, there are only quadratic terms in it. If you divide a quadratic term, even if you can divide it into a polynomial, you will only have quadratic terms at most. Where does a quartic like a4 come from? $\endgroup$ Commented Oct 6, 2023 at 3:45
  • $\begingroup$ I did it with the MATLAB symbolic math toolbox, and it can only simplify to a fraction at most, and it has only quadratic terms at most, as I expected. Are you sure you're not remembering anything wrong? $\endgroup$ Commented Oct 6, 2023 at 3:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .