There is another random walk solution which involves tracking over the space of all lexicographical orders of Latin squares of the specified order. This method appears to provide a reasonably good uniform distribution.
The standard lexicographic order is left to right, top to bottom, but in the context of some other work, I asked the question: "Could other lexicographical orders provide different results?" For example one based on right to left, bottom to top.
It then doesn't take long to realise that the sequence of cells determining a lexicographic order can be any permutation of the cells of the table.
For example: the order of cells determining the 'standard' lexicographical order for a 3x3 Latin square is {(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)} and the contents of these cells in that order determines the key for lexicographic ordering. If we reverse this sequence the key is then determined right to left bottom to top. As another example, The sequence {(1,1),(1,2),(1,3),(3,1),(3,2),(3,3),(2,1),(2,2),(2,3)} defines a lexicographic order where the key is determine by taking the first row left to right, then the last row, and finally the middle row.
However, any permutation of the elements of the above sequence could be used to define a lexicographic key. Such alternative sequences can be obtained by a random shuffles of the above standard sequence. So, for order 3 Latin squares, 9! keys can be defined and this determines the number of different alternative lexicographic orders. Obviously extremely high numbers exist for higher order Latin squares.
I already had a GAP program which, given a Latin square returned the next Latin square in standard lexicographic order. It turned out to be quite a straightforward task to modify this program to provide the next Latin square for any lexicographic order as defined by an alternative sequence.
This gives us a 'track' of Latin squares in an alternative lexicographic order. Every Latin square of the given order occurs somewhere along the track. Clearly such a track exists for each alternative lexicographical order.
After we have moved from Latin square A to Latin square B on one such defined track there is no reason why we can't switch to another track for the next move. With GAP this switching can be achieved by applying the GAP Shuffle function to the sequence defining the current lexicographic order.
Repeating this some number of times produces the random walk over the set of all alternative lexicographic orders.
The question then remains of how many iterations are required to achieve output which has a reasonably uniform distribution. This clearly depends on the order of the Latin squares and more research is required to determine. I have tried 100 and obtained quite good results for smaller (3-6) order Latin squares.
Note: there are a couple of parameters to the random walk. The first is the number of times switching is to occur. The other is the number of Latin squares to be passed through after one switch before the next switch is made. It is not clear at the moment whether the second of these has much impact on the result or whether it is better to use processing time making switches.
Processing seems to shows good uniform distribution for Latin squares of orders 3 - 5 and a good indication that this is the case for orders up to 12.
It would be interesting if uniformity could be theoretically proved. I might ask a math stack question about that!
Above order 12 performance becomes a problem. The program can get bogged down in the back tracking involved in moving from one Latin square to the next. To alleviate this to some extent one can implement a modification to identify when this is happening and then back track to try switching to another track.
Experiment has shown that specifically designed sequences (e.g. one which "spirals in" say) provide lexicographic tracks with a range of different performance characteristics. Selecting sequences at random from a set of better performing sequences rather than from the set of all sequences may another way of addressing the performance issue although it is not clear what biases might be introduced affecting uniformity.
In summary this approach provides a workable solution for Latin squares of order 3 up to around 12 (and maybe higher depending on the hardware used).