Linear algebra to find EV of sort algorithm

Question

I am having a hard time figuring out how to use Markov chains to address a sorting method. It is for a PE problem (i.e. problems done for fun), if anyone's curious. I do not want spoilers but just some guidance on the right track as to the method.

I understand what Markov chains are and when they are to be used, but I don't understand how to set them up here. Basically, I am trying to figure out how to use Markov chains to determine the expected number of swaps in a sort, we'll call it b-sort.

b-sort first sees if the list (which is randomly jumbled from the start) is sorted (0 moves), and if not, it chooses two different elements at random and swaps them. Repeat until sorted.

For instance, for a list of length 4 (4! possible sequences), the average of the expected number of swaps for every possible permutation of the initial list is 24.75. I am trying to figure out how this value is achieved.

The actual problem itself doesn't use b-sort exactly but I figure if I understand how chains are used here I can better tackle the problem. Thanks!

Even though you're right that Byron's link under the previous incarnation of the question didn't immediately answer it, it may be worthwhile having the link. This is one of the reasons why it's not a good idea to delete a question instead of editing it: You lose the comments already made. — joriki, Commented Mar 6, 2012 at 17:35

Glorfindel · Accepted Answer · 2022-08-21 18:09:48Z

I don't see how to do this for a general number of elements, but here's how I'd do it for four elements:

There are five conjugacy classes, represented by $()$, $(12)$, $(123)$, $(12)(34)$ and $(1234)$. For each conjugacy class, you can determine the probabilities of ending up in the other conjugacy classes by applying a random transposition. I don't know if there's a systematic way of doing this, but it's easy enough to to by hand for four elements. This gives you a system of linear equations for the expected number of steps starting at each of the conjugacy classes; for each conjugacy class the expected number is $1$ plus the average of the expected numbers of the neighbouring conjugacy classes reached by random transposition, except for $()$ the expected number is $0$. I won't write down the system since you didn't want any spoilers, but solving it yields $x_0=0$, $x_1=23$, $x_2=26\frac14$, $x_3=27$ and $x_4=27\frac12$ for the above conjugacy classes in that order, and together with the numbers $1,6,8,3,6$ of permutations in each class, that yields the desired average of $24\frac34$.

[Edit as requested:]

For $(12)$, one transposition leads to $()$, one leads to $(12)(34)$ and the remaining four lead to the class $(123)$. Since each of these six transpositions is applied with probability $1/6$, the corresponding linear equation for the expected numbers of steps is

$$x_1=1+\frac16(x_0+4x_2+x_3)\;.$$

Doing the same thing for the remaining three non-identity classes, multiplying through by $6$ and using $x_0=0$ yields the system

$$\pmatrix{6&-4&-1&0\\-3&6&0&-3\\-2&0&6&-4\\0&-4&-2&6}\pmatrix{x_1\\x_2\\x_3\\x_4}=\pmatrix{6\\6\\6\\6}\;,$$

for which Wolfram|Alpha produces the solution given above.

[Edit:]

Here are some references you might want to explore on this:

Group representations in probability and statistics, Chapter 3: Random Walks on Groups (Diaconis): This shows that in a certain sense shuffling by random transpositions asymptotically takes $n\log n$ shuffles to shuffle well.

Random Walks on Finite Groups, Section 9.2: Random Transposition on the Symmetric Group (Saloff-Coste): Has lots of references, including the one above and others by Diaconis.

Eigenvalues of Graphs (Lovász): Has some basic theory about eigenvalues of graphs, including a section on the significance of the eigenvalue gap for random walks on graphs.

Here's a not quite rigorous argument that the expected number of shuffles will asymptotically go as $n!$: As shown in the above references, the difference between the distribution produced by the shuffling and the uniform distribution decays exponentially after $n\log n$ shuffles. We start out with a uniform distribution, and in each step we remove the probability at the identity from the process. If we didn't disturb the uniformity of the distribution by removing probability only at the identity instead of uniformly, we'd have a $1/n!$ chance in each step to reach the identity, and the expected number of steps in that case would be $n!-1$. Since it takes only $n\log n$ steps to spread out the disturbance introduced, and the sum of all disturbances introduced over time is the sum of all probability removed at the identity over time, which is $1$, the total effect of the disturbance on the expected number of steps is at most of the order of $n\log n\ll n!$.

And here's how you could use generating functions to relate the expected number of steps starting from a random permutation to a property of the random walk treated in the question Byron linked to in a comment, which starts at the identity.

For any event that may occur at step $k$ with probability $p_k$, the corresponding (ordinary) generating function is $\sum_kp_kx^k$. Let $f$ denote the generating function for arriving at the identity for the first time starting from a random permutation, let $g$ denote the generating function for arriving at the identity (not necessarily for the first time) starting from a random permutation, and let $h$ denote the generating function for returning to the identity for the first time starting at the identity.

We can readily determine $g$, since in this case the initial uniform distribution is preserved and the probability of arriving at the identity is therefore a constant $1/n!$, so that

$$g(x)=\sum_k\frac1{n!}x^k=\frac1{n!}\frac1{1-x}\;.$$

Now to arrive at the identity from a random permutation ($g$), we must arrive at the identity from a random permutation for the first time ($f$) and then return to the identity ($h$) an arbitrary number of times. The generating functions therefore satisfy

$$g(x)=f(x)\left(1+h(x)+h(x)^2+\dotso\right)=\frac{f(x)}{1-h(x)}\;.$$

Substituting $g(x)$ and solving for $f(x)$ yields

$$f(x)=\frac1{n!}\frac{1-h(x)}{1-x}\;.$$

To check this result, we can calculate the total probability of arriving at the identity at some time, starting from a random permutation, which is

$$\lim_{x\to1}f(x)=\frac1{n!}\lim_{x\to1}\frac{1-h(x)}{1-x}=\frac1{n!}h'(1)$$

by l'Hôpital. But $h'(1)$ is the expected time of steps to return to the identity starting from the identity, and this is $n!$ according to the question Byron linked to, so the total probability is $1$ as it should be.

Similarly, we can calculate the expected number of steps to arrive at the identity for the first time, starting from a random permutation, as

$$ \begin{eqnarray} \lim_{x\to1}f'(x) &=& \frac1{n!}\lim_{x\to1}\frac{1-h(x)-h'(x)(1-x)}{(1-x)^2} \\ &=& \frac1{n!}\lim_{x\to1}\frac{-h''(x)(1-x)}{-2(1-x)} \\ &=& \frac1{n!}\frac{h''(1)}2\;, \end{eqnarray} $$

again by l'Hôpital. Thus the desired expected number of steps can be expressed as an expectation value for the random walk starting from the origin, namely the expectation value of $k(k-1)/(2n!)$ with $k$ the time of first return to the identity.

All this is not specific to permutations; if we replace $n!$ by the number of states, we can apply this to any random walk on a graph. For instance, for the random walk on a directed $n$-cycle, since the time of first return to the identity is $n$ with probability $1$, the expected number of steps to reach the origin starting from a random vertex comes out correctly as $n(n-1)/(2n)=(n-1)/2$.

How do you know there are five conjugacy classes for this and how are they derived? It would also be helpful to see an example of a couple equations just so I can translate what you said into mathematics I can examine and understand. — John Smith, Commented Mar 6, 2012 at 18:46
@John: OK, I added the system of equations and a derivation for one of them. — joriki, Commented Mar 7, 2012 at 14:30
@John: I added some more things that may or may not be of interest to you. — joriki, Commented Mar 7, 2012 at 19:57
Wow, I really appreciate that, thanks! The "real" problem is a variant of b-sort where instead of selecting two items to swap, you select three and shuffle them randomly (6! possibilities) I suspect it's pretty similar with the exception that going from one conjugacy to another involves a different probability. — John Smith, Commented Mar 7, 2012 at 21:45
@John: Yes; with this variant, you'd get a slightly higher mixing rate, so the distribution is driven towards the uniform distribution slightly more quickly, so we might expect the result to be even closer to $n!-1$ than in the case of transpositions. Certainly this is the case for $n=3$, where the distribution is uniform at all times. By the way, instead of "$6!$ possibilities", did you mean "$3!=6$ possibilities"? — joriki, Commented Mar 7, 2012 at 22:05

Robert Israel · Accepted Answer · 2012-03-06 17:57:41Z

0

You could let the states of your Markov chain correspond to the $n!$ permutations of $[1,2,\ldots,n]$. State $[1,2,\ldots,n]$ is absorbing. For every other state $s$ and each pair $1 \le i < j \le n$, the transition matrix has an entry of $2/(n^2-n)$ for row corresponding to $s$ and column corresponding to the state obtained from $s$ by swapping the elements in positions $i$ and $j$.

answered Mar 6, 2012 at 17:57

Robert Israel

455k27 gold badges349 silver badges678 bronze badges

$\begingroup$ So I imagine this would be akin to an N by N matrix where N = n! (in this case, 4!), or a 24x24 matrix. Each location in the matrix delineates the transition from one state (via column) to another (via row) or something. All positions where the row differs from the column value by more than 2 positions is automatically 0 since you can't switch more than two. I don't understand the 2/(n^2-n) thing though. Am I on the right track so far? (Also, what does "absorbing" mean here?) $\endgroup$
– John Smith
Commented Mar 6, 2012 at 18:01
1

$\begingroup$ You don't need to treat all permutations separately; you can have one state for each conjugacy class. $\endgroup$
– joriki
Commented Mar 6, 2012 at 18:06

Add a comment |

Stack Exchange Network

Linear algebra to find EV of sort algorithm

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
sorting
.

Linked

Hot Network Questions

Linear algebra to find EV of sort algorithm

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged sorting.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
sorting
.