17
$\begingroup$

I'm looking for algorithms to generate randomized instances of Latin squares.

I found only one paper:

M. T. Jacobson and P. Matthews, Generating uniformly distributed random Latin squares, J. Combinatorial Design 4 (1996), 405-437

Is there any other method known which generates truly random instances not just isomorphic instances of other instances?

$\endgroup$
4
  • 1
    $\begingroup$ You might look at some of the work of Y. Chen at U of Illinois, which is related, but slightly different. He treats two-dimensional zero-one tables with marginal sum constraints in Y. Chen, P. Diaconis, S. Holmes and J. S. Liu, (2005). Sequential Monte Carlo Methods for Statistical Analysis of Tables. J. Amer. Stat. Assoc., 100, 109-120. To do what you would want would require a generalization to three-way tables. His recent work looks like it might cover this. $\endgroup$
    – cardinal
    Commented Sep 9, 2011 at 16:54
  • $\begingroup$ I have found the algorithms in the aforementioned paper to be less than satisfactory for tables of any real size, but perhaps it works better when all of the marginal sum constraints are 1 instead of something more arbitrary. $\endgroup$
    – cardinal
    Commented Sep 9, 2011 at 16:55
  • $\begingroup$ I should probably just generate $n$ permuations of $\{ 1..n \}$ (e.g. Fisher-Yates-Algorithm) and see if they match up to a $n \times n$-Latin Square. I only need LS up to $n = 12$, so that could work. $\endgroup$
    – user14947
    Commented Sep 10, 2011 at 16:18
  • 5
    $\begingroup$ That won't work, I don't think. Let $L(n)$ be the number of LS of size $n$. Then it's known that $L(n) \leq \prod_{i=1}^n (i!)^{n/i} =: U(n)$. Your scheme will has $S(n) = (n!)^n$ possible squares. Hence, the probability that your random square is a LS is $\leq U(n) / S(n) \leq e^{-89}$. So, if I've done the calculation correctly (and it's possible I haven't), generating 1 billion per second, it would take you on average about $10^{22}$ years before you saw your first latin square. $\endgroup$
    – cardinal
    Commented Sep 10, 2011 at 16:56

3 Answers 3

9
$\begingroup$

In the combinatorics community, the Jacobson and Matthews approach is widely considered the best approach to obtain random Latin squares (approximately) uniformly at random from the set of all Latin squares. A practical non-MCMC approach that samples uniformly would be extremely well-received (but seems beyond current techniques).

Other uniform sampling methods are:

  • Generating Latin squares row-by-row by appending random permutations and restarting whenever their is a clash gives the uniform distribution. [Or equivalently, uniformly sampling from the set of row-Latin squares, then restarting if there is a clash.]
  • Generating a list of all Latin squares, and picking one at random. Storage requirements could be reduced by (a) storing only a list of normalised Latin squares [i.e. first row in order] then randomly permuting the columns after sampling, and (b) storing only the differences between subsequent Latin squares (i.e. Latin trades) [although, this makes the algorithm more complicated].

In either case, for not particularly large n [maybe n>5], these approaches are either impractically slow, or requires impractically large storage space. However, for some applications, we don't need to sample $n \times n$ Latin squares for $n>5$, in which case, this is not a problem.

Moreover, for statistical applications, sampling from all possible Latin squares is often not necessary, in which case we just apply a random isotopism to any given Latin square (i.e. we pick a Latin square, then permute its rows, columns and symbols randomly).

Any attempt to sample via extending a Latin rectangle (or partial Latin square) to a Latin square without restarting from scratch after a clash occurs will almost certainly result in a non-uniform distribution. [I suppose theoretically you could add weights to your intermediate choices.] Different Latin rectangles admit a different number of completions, so if we don't restart from scratch, we will favour Latin squares that have Latin rectangles that admit fewer completions (i.e. there's less competition for those Latin squares).

This non-uniformity might seem like a subtle difference, but consider a $(n-2) \times n$ Latin rectangle. The number of completions is always a power of 2. If the number of completions is 1 (which can happen: take a cyclic group's Cayley table and delete the last two rows), then its completion is guaranteed to be generated from that point on. If the number of completions is $2^{n/2}$ (which can also happen: take the elementary abelian 2-group's Cayley table and delete the last two rows), then the probability of it being generated from that point on could be $2^{-n/2}$ (depending on how things are implemented). So, the difference in probabilities can be at least exponential in n.

Even if you don't care too much about the uniform distribution, the Jacobson and Matthews is still reasonable: it is quite fast and simple to implement (there's also implementations for GAP ("loops") and SAGE available, and probably others I'm unaware of).

$\endgroup$
3
  • $\begingroup$ Ok, I will just implement the Jacobson and Matthews approach. Any opinions on the number of iterations needed to obtain a near uniform distribution? The GAP implementation uses $n^3$. $\endgroup$
    – user14947
    Commented Sep 26, 2011 at 1:29
  • $\begingroup$ It's always hard to say with these things, but I find the GAP version quite reasonable for what I do. $\endgroup$ Commented Sep 26, 2011 at 22:17
  • $\begingroup$ Hi Douglas, do you know if Jacobson's method is fast mixing? I did a quick google scholar but could find any thing. $\endgroup$ Commented Jul 4, 2016 at 21:34
2
$\begingroup$

Any partial $n\times n$ Latin square with the first $r$ rows filled in ($0 \le r < n$) can be extended to a partial Latin square with the first $r+1$ rows filled in (and thus to a complete $n\times n$ Latin square). Do you know how to do this? If so, it's fairly easy to randomise the choices that have to be made. If not, let me know, and I'll try to remember how I did it 17 years ago.

Edit Suppose we have an $n \times n$ Latin square $A$ that is partially filled in: the first $r-1$ rows have been filled in, and the first $c-1$ columns in row $r$. We want to extend the Latin square to column $c$ in row $r$, i.e. we want to fill in $A_{rc}$.

If there exist any values that satisfy the Latin square constraints for $A_{rc}$, choose one of them at random. Otherwise we have to backtrack, as follows:

  1. Construct a directed graph (the 'replacement graph') which specifies which elements in row $r$ may be replaced by which elements. Its vertices are:

    • a start vertex $S$, representing $A_{rc}$
    • an end vertex $E$, representing values unused in the current row
    • for each element $A_{ri}$ in the current row with $i<c$, a vertex $V_i$
  2. Add edges as follows:

    • for each vertex $V_i$, and for each value $v$ unused in column $i$, add an edge from $V_i$ to $V_j$ if $A_{rj} = v$ for some $j$, else add an edge from $V_i$ to $E$.
    • for each value $v$ unused in column $c$, add an edge from $S$ to $V_j$ if $A_{rj} = v$.
  3. Select a path $P$ from $S$ to $E$ at random.

  4. Replace the values in row $r$ as follows:

    • for the first edge in $P$, from $S$ to $V_i$: set $A_{rc} = A_{ri}$
    • for internal edges in $P$, from $V_i$ to $V_j$: set $A_{ri} = A_{rj}$
    • for the last edge in $P$, from $V_i$ to $E$: set $A_{ri}=$any unused value, chosen at random

This algorithm runs quickly, and you won't have any trouble generating $12 \times 12$ Latin squares with it. The problem remains that it probably doesn't generate all Latin squares with equal probability. In particular, it's not clear how to select the path $P$ at random.

Perhaps I should hand-run this algorithm on my counterexample, to show you how it works. We have the partial $4\times4$ Latin square

0 1 2 3
1 3 0 2
3 0 1

The replacement graph has vertices $S$, $E$, $V_1$, $V_2$, $V_3$. Its edges are:

  • $V_1 \rightarrow E$ (because $2$ is unused)
  • $V_2 \rightarrow E$ (because $2$ is unused)
  • $V_3 \rightarrow V_1$ (because $A_{r1}=3$)
  • $S \rightarrow V_1$ (because $A_{r1}=3$)
  • $S \rightarrow V_2$ (because $A_{r2}=0$)

Pick the path $P = S \rightarrow V_2 \rightarrow V_0 \rightarrow E$. Then we set:

  • $A_{3,4} = A_{3,2}$
  • $A_{3,2} = A_{3,0}$
  • $A_{3,0} = $ the only unused value, $2$

And we get:

0 1 2 3
1 3 0 2
2 0 3 1
$\endgroup$
2
  • 1
    $\begingroup$ Yes, you are stating Marhall Halls's Embedding Theorem, of course this is trivial. But I doubt you can generate uniformely distributed Latin Square this way: "We could generate random permutations of the symbols to fill a square a row at a time. Each permutation would be restricted to choices that would not cause column conflicts with the already-filled rows. (It is a consequence of P. Hall’s marriage theorem that such choices always exist; see, e.g., [ll].) However, we have no general way of weighting these choices appropriately in order to achieve the uniform distribution on Latin squares. " $\endgroup$
    – user14947
    Commented Sep 9, 2011 at 17:31
  • 1
    $\begingroup$ from M. T. Jacobson and P. Matthews, Generating uniformly distributed random Latin squares, J. Combinatorial Design 4 (1996), 405-437 $\endgroup$
    – user14947
    Commented Sep 9, 2011 at 17:32
1
$\begingroup$

There is another random walk solution which involves tracking over the space of all lexicographical orders of Latin squares of the specified order. This method appears to provide a reasonably good uniform distribution.

The standard lexicographic order is left to right, top to bottom, but in the context of some other work, I asked the question: "Could other lexicographical orders provide different results?" For example one based on right to left, bottom to top.

It then doesn't take long to realise that the sequence of cells determining a lexicographic order can be any permutation of the cells of the table. For example: the order of cells determining the 'standard' lexicographical order for a 3x3 Latin square is {(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)} and the contents of these cells in that order determines the key for lexicographic ordering. If we reverse this sequence the key is then determined right to left bottom to top. As another example, The sequence {(1,1),(1,2),(1,3),(3,1),(3,2),(3,3),(2,1),(2,2),(2,3)} defines a lexicographic order where the key is determine by taking the first row left to right, then the last row, and finally the middle row.

However, any permutation of the elements of the above sequence could be used to define a lexicographic key. Such alternative sequences can be obtained by a random shuffles of the above standard sequence. So, for order 3 Latin squares, 9! keys can be defined and this determines the number of different alternative lexicographic orders. Obviously extremely high numbers exist for higher order Latin squares.

I already had a GAP program which, given a Latin square returned the next Latin square in standard lexicographic order. It turned out to be quite a straightforward task to modify this program to provide the next Latin square for any lexicographic order as defined by an alternative sequence.

This gives us a 'track' of Latin squares in an alternative lexicographic order. Every Latin square of the given order occurs somewhere along the track. Clearly such a track exists for each alternative lexicographical order.

After we have moved from Latin square A to Latin square B on one such defined track there is no reason why we can't switch to another track for the next move. With GAP this switching can be achieved by applying the GAP Shuffle function to the sequence defining the current lexicographic order.

Repeating this some number of times produces the random walk over the set of all alternative lexicographic orders.

The question then remains of how many iterations are required to achieve output which has a reasonably uniform distribution. This clearly depends on the order of the Latin squares and more research is required to determine. I have tried 100 and obtained quite good results for smaller (3-6) order Latin squares.

Note: there are a couple of parameters to the random walk. The first is the number of times switching is to occur. The other is the number of Latin squares to be passed through after one switch before the next switch is made. It is not clear at the moment whether the second of these has much impact on the result or whether it is better to use processing time making switches.

Processing seems to shows good uniform distribution for Latin squares of orders 3 - 5 and a good indication that this is the case for orders up to 12.

It would be interesting if uniformity could be theoretically proved. I might ask a math stack question about that!

Above order 12 performance becomes a problem. The program can get bogged down in the back tracking involved in moving from one Latin square to the next. To alleviate this to some extent one can implement a modification to identify when this is happening and then back track to try switching to another track.

Experiment has shown that specifically designed sequences (e.g. one which "spirals in" say) provide lexicographic tracks with a range of different performance characteristics. Selecting sequences at random from a set of better performing sequences rather than from the set of all sequences may another way of addressing the performance issue although it is not clear what biases might be introduced affecting uniformity.

In summary this approach provides a workable solution for Latin squares of order 3 up to around 12 (and maybe higher depending on the hardware used).

$\endgroup$

You must log in to answer this question.