43
\$\begingroup\$

Write an algorithm or program that can encode and decode a chess board. The goal is to make the smallest representation of a chessboard that could be used (once decoded) to determine all movement possibilities for a player on that turn.

The encoding must be able to show:

  • Whose turn it is.
  • Whether the player can castle on each side.
  • Whether the player can perform an en-passant, and if so, which of their pawns?
  • Positions of all pieces.

Important note on castling: If white moves their king one turn, and then moves it back the next, it must be clear that they cannot castle on either side after that. Same would go if they moved their left or right rook. Although the board is visually in the same state as it was two turns ago, the game state has changed. More info here: http://en.wikipedia.org/wiki/Chess#Castling

Important note on en-passant: This is also a turn-sensitive move. Read the rules for more info. http://en.wikipedia.org/wiki/Chess#En_passant

Determine input and output as needed. Major props to whoever can compress it the most!

Your score is determined worst-case scenario - maximum possible size in bits. Make sure you show how you calculated it that number and what you accounted for. Shoot for the smallest worst-case!

\$\endgroup\$
21
  • \$\begingroup\$ What do you mean by "bitwise"? \$\endgroup\$ Commented Jan 25, 2014 at 23:16
  • \$\begingroup\$ Is this smallest code or most-compressed? Most-compressed is more interesting. \$\endgroup\$
    – Justin
    Commented Jan 26, 2014 at 0:15
  • \$\begingroup\$ Sorry for not clarifying. Bitwise in the sense that you can only compress it if you start representing it as bits, which will require some bitwise operation. Poor use on my part. Also most-compressed, not smallest code. \$\endgroup\$
    – sbseltzer
    Commented Jan 26, 2014 at 1:59
  • 2
    \$\begingroup\$ @GeekWithALife Yes, it's possible to have 18 queens on the board at the same time. Follow this link and click the playback button for an example. \$\endgroup\$
    – r3mainer
    Commented Jan 26, 2014 at 9:19
  • 1
    \$\begingroup\$ @AJMansfield, that might be useful for boards with about 28 men, but I calculate that 117 bits is plenty for boards with all 32 men, which is about 50 bits less than the target. The complication is that once you go below 32 men, promotion can give a player more bishops. \$\endgroup\$ Commented Jan 28, 2014 at 7:39

17 Answers 17

27
\$\begingroup\$

Min: 12 bits
Max:
Avg:

Had and thought last night that I could possibly make it even smaller.

x   Colour to play next (0 -> Black, 1-> White)
1   Only King left?

00000 Position of White King (0 -> A1 ... 63 -> H8)
00000 Position of Black King

01 00000 11111  WK:A1, BK:H2 (Black to play)
11 00000 11111  WK:A1, BK:H2 (White to play)

The result is an impressive size of 12 bits!

So what about K +1 other type of piece?

x
 0
   0
     000  +Pawn
     001  +Rook   
     010  +Knight
     011  +Bishop
     100  +Queen

There 2 possible arrangement of the sub tree.

   /\      /\
  +  K    K  +

Both result in the same bit sizes for all pieces. So it make no difference to which we use, I'll choose the first one.

x
 0
  0
   000
      1011001110000000000000000000000000000000000000000000000000000000000000
(+ 000) En-Passant (if >= 2 pawn & pawn in en-passant positions)
(+ 00 ) Castlings  (if >= 1 rook)
Min: 75 bit
Max: 109 bits

So on King +2 other piece types

x
 0
  1
   PRBNQ
   00011  +N +Q
   00101  +B +Q
   00110  +B +N
   01001  +R +Q
   01010  +R +N
   01100  +R +B
   10001  +P +Q
   10010  +P +N
   10100  +P +B
   11000  +P +R

There are 5 possible sub trees (I'll use 1 and 2 to indicate which of the pieces.)

   /\          /\       /\         /\          /\
  /  \        /  \     /  \       /  \        /  \
 K   /\      /\   2   /\   \     1   /\      /\   \
    1  2    K  1     K  2   1       K  2    1  2   K

So we'll require 3 bits to encode which sub tree to use.

x
 0
  1
   PRBNQ
         000  Sub Tree used

Min:= 11 = Header 
       6 = 2 * 3
       4 = 1 * 4
       4 = 1 * 4
      60 = 60    Empty
      --
      85 bits

Max:=  11 = Header
        4 =  2 * 4 Kings
       48 = 16 * 3 Pawns
       12 =  4 * 3 Rook
       42 = 42 * 1 Empty
        3 =  1 * 3 En-Passant
        2 =  1 * 2 Castlings
      ---
      122 bits

Still doing the analysis for more pieces

+3 Other

x
 0
  1
   PRBNQ
         0000  Sub Tree used (of 14 possible)

+4 Other

x
 0
  1
   PRBNQ
         000000  Sub Tree used (of 42 possible)

+5 Other

x
 0
  1
   PRBNQ
         0000000  Sub Tree used (of 132 possible)
 (+000)
 (+00)

Max: 208?


Is it possible to encode all these sub-trees into 9bits?

If we total up all of the possible sub-trees we get 392 possible sub-trees.

 1  0
 2  2
 3  5
 4  14
 5  42
 6  132
    ---
    392  <= 2^9

Using Freq ID

Since there 164603 unique piece frequencies.

Log2( 164603) = 17.3286110452
             ~ 18 bits

0
 0000 0000 0000 0000 00  Freq ID

(+000) (+00) Castling

Max:= 204 bits


rev 3

Min: 82 Max: 199 Avg: 160

Finally got around doing some analysis to find the maximum bit size. With optimal huffman encoding for each of the unique piece frequencies.

               0   Player
              00  Castling
               0  En-Passant Possible
            ?000  En-Passant column (include if En-Passant Possible = 1
  0000 0000 0000  Tree Encoding ID
[Board Encoding]  Between 66 .. 180 bits 

Note this is the worst possible size, which the En-Passant column bits if the number of pawns is greater than one. Irrespective of those pawns colours and position there is the potential for some boards to be 3 bits smaller.

Also there are only 144 different sizes(Worst case) for the size of the board.


75 - 216 bits (v2) v1 The minimum size is 98 bits (12.25 bytes), only the two kings on the board.

Maximum size is only 216 bits (27 bytes.) In the unlikly:-

  9 x Queens
  1 x King
  2 x Rooks
  2 x Knights
  2 x Bishops
on each side.

On average the size will be around 157 bits (19.625 bytes).

Pieces

To encode the board I'm using a binary tree encoding scheme. An empty square is the most frequent from any where between 32 and 62 appearances. Next are the pawns, then Rooks, Knights, Bishops and the least frequent are the Queen and King.

0 - left node
1 - righ node

     /\
    e  \    e:= Empty Square
      B/\W  B:= Black ; W:= White
      /  \
     /    \
    /      \
   /\      /\
  p  \    p  \  p:= Pawn
     /\      /\
    /  \    /  \
   /\  /\  /\  /\
  r  b n \ r b n \  r:= Rook; b:= Bishop; n:= Knight
         /\      /\ 
        q  k    q  k  q:= Queen ; k:= King

The starting board can be encode in just 166 bits (20.75 bytes)

  A     B     C      D      E     F     G     H
-----+-----+-----+------+------+-----+-----+------+
10100 10101 10110 101110 101111 10110 10101 10100 | 8 
  100   100   100    100    100   100   100   100 | 7
    0     0     0      0      0     0     0     0 | 6
    0     0     0      0      0     0     0     0 | 5
    0     0     0      0      0     0     0     0 | 4
    0     0     0      0      0     0     0     0 | 3
  110   110   110    110    110   110   110   110 | 2
11100 11101 11110 111110 111111 11110 11101 11100 | 1

To indicate whom's move it takes only single bit

0-> Black , 1-> White

Castling can be encoded in 4 bits.

 B  W
LR LR
00 00

So I've use 171 bits (21.375 bytes)

En-Passe can be encoded into just 16 bits (2 bytes)

So in total thats 187 bit (23.375 bytes).

Layout

  bits    Encodes
 0 -  15  En-Passe
16 -  19  Castling
      20  Move 
21 -  23  Unused
24 -> ..  Board

Not written any code yet.

Notice that 3 of the bits that are unused. So the max is 213bits.


Possible Improvements

1) Reduced the header block form 24 to 8 bits (with @Peter Taylor suggestion)

0 - 2 En-Passant
    3 Move
4 - 7 Castling
8 ... Board Pieces 

2) Variable length header

A small 4 bit fixed header

0 0 0 0
| | | |
| | | +-> En-Passant Block Present?
| | | 
| | +---> Pawns on board?
| |
| +-----> Castling still possible?
|                
+-------> Who's move? 0-Black 
                      1-White

Next block of additional bits (If castling is still possible)

00 00
|| ||
|| |+-> White Castle Right
|| +--> White Castle Left
||
|+----> Black Castle Right
+-----> Black Castle Left

Next block of additional bits (if pawns are present)

000--> En-Passant column Position

So now I have a variable length header 4 - 11 bits


3) Use a different encoding scheme depending on what pieces are left on the board.

By changing the tree encoding depending on what pieces are on the board and there frequency.

One possible encoding for an end game state (No Pawns)

        /\            
       e /\           
  Black /  \ White
       /    \
      /      \
     /        \       
    /\        /\
   /  \      /  \     
  /   /\    /   /\
 /\  / /\  /\  / /\   
r b n q k  r b n q k

Which is about ~4 bits per piece.

Which type of pieces are present on the board?

RBNQK Permutation
11111 (11111)

Permutation is Variable length 0-5 bits. If only one type of piece is left then don't include it.

Which permutation of those pieces to use for the tree? This is the factorial of the number of pieces in the above example it is 5 pieces so 120 possible permutations that can be encoded.

 #    !  bit 
 6  720  10  (If pawn included)
 5  120   6
 4   24   5
 3    6   3
 2    2   1  Don't include as of equal size.
 1    1   0  Don't include as its not needed.

Remember that there is addition bits for the empty squares and color.


Examples

Let's give an example of only QK left

RBNKQ
00011

  /\
 s  \
    /\
  B/  \W
  /\  /\
q  k q  k

101 100  0 x 60 110 111 ==> 60 + (2 x 6) = 60 + 12 = 72 bits for the board

0000 00011 Header ==> 9 bits

81 bits total


Let's give and example of only kings left

 RBNQK
 00001 

  /\
 s  k
   / \
  B   W

 10 0 0 0 0 0 0 0   K... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 11  .... ...k

Put all together

 header  4   0 0 0 0
 pieces  5   0 0 0 0 1
 perm    0   - - - - - -
  board 66   10 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 11

So I calculate the smallest encoding for board at 75bits (9 bits 3 bit)

Still yet to calculate how this coding scheme affects the maximum size.


Improvement 4

Reduce the number of bits for castling to just 2 bits. Just castling for the player who's turn it is.

 0 Castling possible (from header block)
 LR 
 00

Thinking about it, it maybe better just to include the 2 bits inside the header block.

\$\endgroup\$
17
  • 2
    \$\begingroup\$ You don't need 16 bits for en passant. At most one pawn moved last turn, so four bits suffice (e.g. 1111 for "no en passant possible" or the column as a binary number otherwise). \$\endgroup\$ Commented Jan 26, 2014 at 15:24
  • \$\begingroup\$ Why do pawns get a better position in a tree? In the common case they're most common, but in extreme cases R/B/N can appear 10 times. \$\endgroup\$
    – ugoren
    Commented Jan 26, 2014 at 21:16
  • \$\begingroup\$ @PeterTaylor, actually 3 bits are enough for en passant. If you count only columns with a black 5th rank pawn (assuming white move), 8 becomes invalid. \$\endgroup\$
    – ugoren
    Commented Jan 26, 2014 at 21:26
  • 1
    \$\begingroup\$ Note that your worst case is actually not possible. Promotion is impossible without captures (at least one capture per 2 promotions is needed). \$\endgroup\$
    – ugoren
    Commented Jan 27, 2014 at 9:02
  • 3
    \$\begingroup\$ Note to encode 64 positions (for white or black king) you need 6 bits (2**6 = 64). \$\endgroup\$ Commented Sep 23, 2014 at 22:04
22
\$\begingroup\$

192 bits (worst case)

Here's a very simple storage scheme that should cope with arbitrary pawn promotions, and never requires more than 64 + 4 × 32 = 192 bits:

  • The first 64 bits store a bitboard that tells where the pieces are (but not what they are). That is, we store one bit for each square of the chessboard (starting at square a1, then b1, c1, etc. up to square h8) such that a vacant square is represented by 0 and an occupied square by 1.

  • Next, for each of the squares marked as occupied on the bitboard, we store a 4-bit nibble encoding the piece on that square. The first of the four bits encodes the color of the piece (0 = white, 1 = black), while the remaining three bits encode the type of the piece:

    +-----+-----+-----+-----+
    |Black|   Piece Type    |
    +-----+-----+-----+-----+
       4     3     2     1    Bits
    

    Piece Type

    0 = (normal) pawn
    1 = (normal) rook
    2 = knight
    3 = bishop
    4 = queen
    5 = king (of player to move next)
    6 = king (of the other player)

    Note the two encodings for the king, used to determine which player's turn it is to move. (In fact, since there are always two kings on the board, allowing for four combinations of the codes 5 and 6, we could rather easily encode a second bit of information here.)

    Black Type Description
    +----+----+--------------------------------+
    |  0 | 5  | White King; White to move next |
    +----+----+--------------------------------+
    |  0 | 6  | White King                     |
    +----+----+--------------------------------+
    |  1 | 5  | Black King; Black to move next |
    +----+----+--------------------------------+
    |  1 | 6  | Black King                     |
    +----+----+--------------------------------+
    

    To encode the extra information needed for the en passant and castling rules, we introduce one additional piece type, which denotes either a pawn or a rook depending on the row it appears on:

    7 (on rows 1 and 8) = a rook that has never moved, and whose king has also never moved, and which is therefore eligible for castling
    7 (on rows 4 and 5) = a pawn that has just advanced two squares, and may therefore be captured en passant

Putting it all together:

     Hex Description
    +---+---------------------------------------------+
    | 0 | White Pawn (normal)                         |
    | 1 | White Rook (has moved)                      |
    | 2 | White Knight                                |
    | 3 | White Bishop                                |
    | 4 | White Queen                                 |
    | 5 | White King; White to move next              |
    | 6 | White King                                  |
    | 7 | White Rook (pre castle) / Pawn (en Passant) |
    | 8 | Black Pawn (normal)                         |
    | 9 | Black Rook (has moved)                      |
    | A | Black Knight                                |
    | B | Black Bishop                                |
    | C | Black Queen                                 |
    | D | Black King; Black to move next              |
    | E | Black King                                  |
    | F | Black Rook (pre castle) / Pawn (en Passant) |
    +---+---------------------------------------------+

The total number of bits required to encode the state of the board is therefore 64 + 4 × # of pieces on board. Since there can never be more than 32 pieces on the board, the maximum length of this encoding is 192 bits.

For example, using the encoding described above, the initial state of the board would be encoded as (whitespace inserted for readability):

1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111
0111 0010 0011 0100 0101 0011 0010 0111 0000 0000 0000 0000 0000 0000 0000 0000
1000 1000 1000 1000 1000 1000 1000 1000 1111 1010 1011 1100 1110 1011 1010 1111

or, in hexadecimal:

FFFF 0000 0000 FFFF 7234 5327 0000 0000 8888 8888 FABC EBAF
\$\endgroup\$
11
  • 2
    \$\begingroup\$ "pawn that has just advanced two squares" and "rook that has never moved" could share the same state slot since they are mutually exclusive depending on the row they are on. The extra free state could be used to encode "king of colour whose turn it is"; you should be able to get rid of the dangling bit that way. \$\endgroup\$
    – FireFly
    Commented Jan 26, 2014 at 23:40
  • \$\begingroup\$ You can also arguably save a bit by only storing 63 bits for the bitboard, and inferring the last bit from the number of men encoded. (It's not clear to me whether this is cheating or not, because it requires external encoding of the length of the bit sequence). And if you ditch the states 6 and 7 for the men you need to encode up to 6^32, which takes 82.7 bits; rounding up to 83 it's saving 13 bits over using states 6 and 7, and you can encode that same information in only 8 bits. \$\endgroup\$ Commented Jan 26, 2014 at 23:46
  • \$\begingroup\$ Thanks, @FireFly! I've implemented your suggestion. (Peter Taylor's suggestion is even more efficient, of course, but I haven't used it so far because I kind of like the simple binary encoding of the current scheme. You could always submit it as a separate entry...) \$\endgroup\$ Commented Jan 27, 2014 at 0:33
  • \$\begingroup\$ Okay -- this is mildly complicated. But hear me out. If you just take the 1 bit hit on a turn indicator, you could replace king with turn with a piece I call pawn1. Change pawn to pawn0. Now, whenever there is a (not en passant vulnerable) pawn -- you also gain one bit of information on the next piece (either a 0 for pawn0 or a 1 for pawn1). Because there are 16 pawns, you gain 16 bits less 1 for the turn indicator. Whenever there is a pawn that is vulnerable to en passant you have to add one bit back in after it. But as en passant must happen immediately, your minimum gains are 14 bits. \$\endgroup\$ Commented Aug 8, 2016 at 18:52
  • \$\begingroup\$ You could also do a similar thing with something like bishops. Suppose that you have a bishop not in its a 'special zone' (the 10 spots in its corners and the center row on its side) is marked as special. Because you know its location -- you know its a bishop. Now you have two bishops and could give each of them a 0 or 1 on the next piece. This gives up to another 4 bits -- but the worst case has bishops all over the special zones, and no gain. \$\endgroup\$ Commented Aug 8, 2016 at 18:55
16
\$\begingroup\$

160 bits worst case

After posting my previous answer 22 bytes, I began to wonder if we could get down to 21 bytes. However when I saw Peter Taylor's amazing 166 bytes I thought "Hang on, it looks like five 32-bit words could be possible!"

So after quite a lot of thought, I came up with this: 159.91936391 bytes (a pretty tight fit!) This level of compression would need a fairly complicated program but I have thought about how to make it run in reasonable time.

This is going to be a long post, so please bear with me, I will post what I can today and add a few bits of code soon.

So, here's how to do it:

En Passant and castling encoded by illegal positions (0 bits)

En Passant

As mentioned in other answers, there are a maximum of 5 possible squares on which a pawn vulnerable to en passant can stand. These are the squares next to the pawns of the player whose turn it is.

To encode this, the pawn vulnerable to en passant is exchanged with whatever is on one of the squares in the first or last row. This may be either a man or an empty square. This produces an illegal position, as pawns cannot be on these rows. The decoder must return the pawn to its correct position, extracting the en passant information.

In order for this to not interfere with the castling encoding, it is important that the squares on which the kings stand at the start of the game are not disturbed, and that the en passant encoding procedure does not place the kings next to each other, which would be an illegal king position. To satisfy the second of these points, the encoder has two choices as to which square it exchanges the en passant pawn. First choice square for each of the up to 5 pawns are A8,B8,C8,G8,H8. Second choice: A1,B1,C1,G1,H1.

Castling

A king who is allowed to castle is by definition, still on his initial square. With the white king on his initial square, there are a total of 63 squares where the black king can stand, 58 of which are legal (he is not allowed to move right next to the white king because he would put himself in check.) If the white king is allowed to castle, he is either allowed to castle with his left rook, his right rook, or both. There are thus 3x58=174 possibilities where the white king can castle, another 174 where the black king can castle, and a further 3x3=9 where both can castle, a total of 357.

There are 420 illegal arrangements of the two kings where they are on adjacent squares: 3x4=12 when the white king is in the corner, 5x24=120 when he is on the edge, and 8x36=288 when he is on another square. Therefore there are easily enough illegal positions to encode all possible castling possibilities.

If at least one king is allowed to castle, encoder would look up the castling data and the positional data of those kings not allowed to castle in a table (or alternatively, use an algorithm which I won't specify here) and produce an illegal position of the two kings. It would then exchange the kings with whatever happened to be on these squares.

It is important that this is encoded after and decoded before the en passant, otherwise there are some potential interferences.

Comparison

So, so far I have used no bits! Looking at Peter's answer, I still have the following to encode:

Whose turn is it?                                   1.000 bits
Which squares are occupied by men of which colour? 91.552 bits 
Subtotal                                          *92.552 bits* 
For the two colours, which men and which order?   *68.613 bits* 
GRAND TOTAL                                       161.165 bits

This is for the worst case of 29 men (see Peter's answer.) Below I will show how I make a marginal improvement (at least for the case of 29 men) on both of the points marked in **.

Which squares are occupied / whose turn is it?

The easy way to encode which squares are occupied is with a 64 bit grid. This also tells us how many squares are occupied. However it is somewhat wasteful because it is not possible for there to be more than 32 squares occupied. My solution is to use 1's to encode for the occupied squares when it is White's turn and 0's to encode for occupied squares when it is Black's turn. Now all combinations are used and there is no waste.

Thus we save on a bit for storing the turn: less than 32 1's, it is white's turn, more than 32 1's, it is black's turn. The only ambiguous case is when all the men are on the board and there are 32 1's and 32 0's. Therefore an extra bit is needed for this case only. As there can be no promotions until a capture has occurred, this extra bit does not affect the worst case overall (which occurs with 3 men captured and 29 remaining.)

Colour of the men occupying the squares

We know from the above how many men there are. The following extract of Pascal's triangle tells how many possibilities there are for the different distributions of black and white. For example, for 3 men, the possibilities are: 3 black men (1 permutation) 2 black, 1 white, (3 permutations), 1 black, 2 white (3 permutations), 3 white (1 permutation.) The total is 23=8. In general, for the lower numbers of men there are 2n possibilities. However the all black and all white possibilities are illegal (at least the king of each side must be on the board) so the actual number of legal permutations is 2n-2 (ignore the 1's on the Pascals triangle.)

For more than 16 total men, there is an additional constraint in that there can be no more than 16 men of each colour on the board. Therefore when all 32 men are on the board there must be 16 of each and the total number possibilities is 601080390 which is quite a bit less than 232.

1   1    1    1      1     1      1       1       1        1        1         1         1         1          1          1          1 
1   2    3    4     5      6      7       8       9       10       11        12        13        14         15         16         17
1   3    6   10    15     21     28      36      45       55       66        78        91       105        120        136        153
1   4   10   20    35     56     84     120     165      220      286       364       455       560        680        816        969
1   5   15   35    70    126    210     330     495      715     1001      1365      1820      2380       3060       3876       4845
1   6   21   56   126    252    462     792    1287     2002     3003      4368      6188      8568      11628      15504      20349
1   7   28   84   210    462    924    1716    3003     5005     8008     12376     18564     27132      38760      54264      74613
1   8   36  120   330    792   1716    3432    6435    11440    19448     31824     50388     77520     116280     170544     245157
1   9   45  165   495   1287   3003    6435   12870    24310    43758     75582    125970    203490     319770     490314     735471
1  10   55  220   715   2002   5005   11440   24310    48620    92378    167960    293930    497420     817190    1307504    2042975
1  11   66  286  1001   3003   8008   19448   43758    92378   184756    352716    646646   1144066    1961256    3268760    5311735
1  12   78  364  1365   4368  12376   31824   75582   167960   352716    705432   1352078   2496144    4457400    7726160   13037895
1  13   91  455  1820   6188  18564   50388  125970   293930   646646   1352078   2704156   5200300    9657700   17383860   30421755
1  14  105  560  2380   8568  27132   77520  203490   497420  1144066   2496144   5200300  10400600   20058300   37442160   67863915
1  15  120  680  3060  11628  38760  116280  319770   817190  1961256   4457400   9657700  20058300   40116600   77558760  145422675
1  16  136  816  3876  15504  54264  170544  490314  1307504  3268760   7726160  17383860  37442160   77558760  155117520  300540195
1  17  153  969  4845  20349  74613  245157  735471  2042975  5311735  13037895  30421755  67863915  145422675  300540195  601080390

The number of possibilities can be found by summing the "rows" of this extract of pascals's triangle (by which I mean the NE-SW diagonals of the table , because I rotated the triangle 45 degrees anticlockwise for convenient presentation. The number of bits needed to encode the turn, occupied squares and colour of the men is therefore as follows:

Up to 25 men: slightly less than 64+(number of men)
For more than 25 men:

men permutations  bits required  occupied sq+turn   
    of colours                   (bits required)  total bits
26   55791790     25.7335495      64              89.7335495
27  100960110     26.58921015     64              90.58921015
28  175844430     27.3897244      64              91.3897244
29  290845350     28.115677       64              92.115677   
30  445962870     28.73234836     64              92.73234836
31  601080390     29.16298271     64              93.16298271
32  601080390     29.16298271     65              94.16298271

For the two colours, which men and in which order?

As per previous answers, no pawns can be promoted until a capture has occurred, because each pawn is blocked by a pawn of the opposite colour on the same column. Peter's answer considered (as an upper bound) that every capture could lead to one promotion for the side being captured and two for the side capturing. However we can split this into several cases:

  1. Black pawn captures white pawn: Now the capturing pawn is free to promote as he is now on a different column. His colleague on the same column can also promote. The black pawn on the white pawn's original column can also promote. This is the only case that permits 3 promotions.

  2. Black pawn goes past white pawn on adjacent column and then captures white piece (other than pawn) behind. This enables the capturing pawn and the white pawn who was on the original column to promote. One promotion for each side.

  3. White pawn is captured by piece (other than pawn.) This would normally allow one promotion for Black only. The only exception is when it liberates a blocked up pawn formation which was already caused by several captures moving pawns onto the same column.

So, as a base case, we can consider that each capture permits one promotion each for both sides. In the case that the man captured is a pawn, there may be an additional promotion for the capturing side.

I have written a program similar to Peter's. It is somewhat cruder, but it has an important addition: it can calculate the number of permutations possible when a player starts with less than the normal 8 pawns. Here is some data produced by the program:

Max promotions   0            1            2             3             4              5 
8 PAWNS 
13 men    18725850    146911050    567991710    1373480394    2297173164     2902775304
14 men    36756720    339459120   1555313760    4501448952    9021804792    13325103792
15 men    60810750    660810150   3555401850   12144582450   28834205400    50030580600
16 men    64864800    843242400   5383778400   21810428640   61514893440    1.26476E+11
7 PAWNS                         
13 men    17760600    141003720    546949260    1321302840    2200401060     2761730400
14 men    30270240    287567280   1331890560    3852728880    7641553920    11068817760
15 men    32432400    372972600   2075673600    7209001800   17135118000    29315286000
6PAWNS                          
13 men    14054040    114594480    447026580    1069488420    1739577840     2113185360
14 men    15135120    151351200    718918200    2087805720    4073028960     5697051360                         
5 PAWNS                         
13 men     6486480     55135080    217297080     510630120     794233440      910235040

We can see that for a case like 8 pawns, 15 men, 0 promotions, the number of permutations is only slightly less than for 8 pawns 16 men, 0 promotions. However if we consider a case like 7 pawns, 15 men, 0 promotions (which is the same as considering that the man captured was definitely a pawn) we get about half the number of permutations.

So, For the case when Black has 16 men and white has 15 men, we can consider an upper bound estimate of 2 promotions for Black and one promotion for White:

5383778400 x 660810150 = 3.55766E+18 possibilities

However we can do better if we proceed as follows.

A. Consider one promotion each for Black and White assuming that the man White has lost could be of any type:

843242400 x 660810150 = 5.57223E+17 possibilities

B. Consider the additional possibilities for Black if he has two promotions, multiplied by only those possibilities for White in which he has lost a pawn.

(5383778400-843242400) x 372972600 = 1.6935 E+18 possibilities.

Adding these two together we get 2.25072E+18, which is a smaller number than 3.55766E+18. All the possibilities for up to 3 captures (29 men remaining) are listed below.

(Promotions, Pawns lost) possibilities

BLACK 16 MEN, WHITE 15 MEN. ESTIMATE   3.55766E+18 = 2^61.62563249
(1,0)   843242400 x (1,0)  660810150 = 5.57223E+17
(2,0)  4540536000 x (1,1)  372972600 = 1.6935 E+18
                               TOTAL   2.25072E+18 = 2^60.96509144

                                                        
BLACK 16 MEN, WHITE 14 MEN. ESTIMATE   9.5675 E+19 = 2^66.3747752
(2,0)  5383778400 x (2,0) 1555313760 = 8.37346E+18
(3,0) 16426650240 x (2,1) 1331890560 = 2.18785E+19
(4,0) 39704464800 x (2,2)  718918200 = 2.85443E+19
                               TOTAL   5.87962E+19 = 2^65.67235739

                                                        
BLACK 16 MEN, WHITE 13 MEN. ESTIMATE   2.69447E+20 = 2^67.86856193
(3,0) 21810428640 x (3,0) 1373480394 = 2.99562E+19
(4,0) 39704464800 x (3,1) 1321302840 = 5.24616E+19
(5,0) 64960896000 x (3,2) 1069488420 = 6.94749E+19
(6,0) 69702272640 x (3,3)  510630120 = 3.55921E+19
                               TOTAL   1.87485E+20 = 2^67.34533572
                                                        

BLACK 15 MEN, WHITE 15 MEN. ESTIMATE   1.47491E+20 = 2^66.99918768
(2,0)  3555401850 x (2,0) 3555401850 = 1.26409E+19
(2,1)  2075673600 x (3,0) 8589180600 = 1.78283E+19
(3,0)  8589180600 x (2,1) 2075673600 = 1.78283E+19
(3,1)  5133328200 x (3,1) 5133328200 = 2.63511E+19
                  TOTAL BOTH COLUMNS   7.46486E+19 = 2^66.01674923

                                                        
BLACK 15 MEN, WHITE 14 MEN. ESTIMATE   4.51366E+20 = 2^68.61286007      
(3,0) 12144582450 x (3,0) 4501448952 = 5.46682E+19
(3,1)  7209001800 x (4,0) 4520355840 = 3.25873E+19
(4,0) 16689622950 x (3,1) 3852728880 = 6.43006E+19
(4,1)  9926116200 x (4,1) 3788825040 = 3.76083E+19
(5,0) 21196375200 x (3,2) 2087805720 = 4.42539E+19
(5,1) 12180168000 x (4,2) 1985223240 = 2.41804E+19
                  TOTAL BOTH COLUMNS   2.57599E+20 = 2^67.80368692

So for the worst case of one side with 15 men and the other with 14 men, we need 67.804 bits.

Adding this to the 92.116 bits required to specify which squares and which colour, we get a total of 67.804+92.116=159.92 bits.

\$\endgroup\$
2
  • 1
    \$\begingroup\$ Many thanks to @Einacio for changing my decimal commas to decimal points. I did a lot of my tables on Excel on a Spanish computer, and posting this was a big job, so fixing this was something I left for later. As I say, I haven't finished this post yet, I will add my permutation counting program and some fragments of code about encoding / decoding when I have time. PS. I had no idea so many people were reading this :-) \$\endgroup\$ Commented Feb 12, 2014 at 20:54
  • 1
    \$\begingroup\$ at the end you managed to tak of bytes instead of bits which is the thing you meant, that might cause some cinfusion to the readers \$\endgroup\$
    – masterX244
    Commented Sep 27, 2014 at 13:42
13
\$\begingroup\$

177 bits worst case

This algoritm, while hardly simple, gives a 177 bits worst case (184b=23B in practice), 13b (16b=2B) best case scenario when there's just 2 kings left.

Bit     Description
  1     Turn (0=white 1=black)
  2-  7 White king position (2-4=letter, 5-7=number)
  8- 13 Black king position (8-10=letter, 11-13=number)
 14- 75 Which squares contain pieces (skipping the 2 king squares, so only 62)
        Ordered a1-h1,a2-h2,(...)
 76-105 Which color owns the square with their piece (0=white, 1=black)
        If there's LESS than 30 pieces (apart from kings), this area is
        smaller
106-end Square data

Square data has the following system:
Every square gets assigned a number which determines piece. Number is:
0 Queen
1 Rook
2 Bishop
3 Knight
4 Pawn OR allowed-castle rook depending on square
5 Pawn subject to potential enpassant

The first bits (max 13) is the potential enpassant slots from A-H, determined
from data of 1 + 14-105 for which of the squares has a piece, and which color
owns the piece and whose turn it is. For example, if turn is White (bit 1 is
0), all pieces on row 5 which is Black owned (determined from 14-105 metadata)
and has at least 1 adjacant (on the same row) square owned by White, is
explained in A-H order. A base 6 number is used which is converted to binary
for the storage. On reading, it's converted and read A-H according to the
numbers above (4 is obviously pawn in this case).
The second amount of bits takes care of the 1st and 8th row (not corners!)
in b1-g1,b8-g8. These only take up 2 bits since 4 or 5 is never needed
(pawn on 1st or 8th is invalid).
The third amount of bits takes care of the rest of the board, in the following
order: a1,h1,a2-h2,a3-h3,a4-h4,a5-h5,a6-h6,a7-h7,a8,h8 (skipping the
"enpassant" slots), in base 5 (since piece ID 0-4 are the only used) converted
to binary.

Best case: 13 bits (bit 1 for turn, bit 2-12 for kings)
Worst case: 177 bits
* 32 pieces with kings
* 5 viable enpassant pawns
* No pieces at 1st or 8th row (except if kings+rooks are at initial posions
whether or not they can castle)
In this case, the space as following:
  1   bit   turn
+ 12  bits  king positions
+ 62  bits  which squares have pieces
+ 30  bits  color of pieces
+ 13  bits  enpassant area
+ 0   bits  initial rows area
+ 59  bits  the rest of the area
= 177 bits  total

Potential optimizations but not really worth it IMO:
* Decrease average by make corners 2 bits as well if kings aren't at e1/e8
* Alter reading order to read b1-g1,b8-g8 last - decreases worst case to
  176 bits if the "which squares have pieces" area is cut off if 30 existing
  pieces has been defined already. Would actually save 8 bits on file but meh
\$\endgroup\$
4
  • \$\begingroup\$ Very nice. You can make this even more efficient by replacing bits 14-105 (92 bits) with an encoding based on multinomial coefficients. sum_{i=0}^{15} sum_{j=0}^{15} 62! / (i! j! (62-i-j)!) < 2^87.45. \$\endgroup\$ Commented Jan 27, 2014 at 9:32
  • \$\begingroup\$ The only thing I would change is to create a rather more simplified version for the enpassant area. For example: if you encode the 30 pieces in base 5, and there are a maximum of 5 enpassant positions, then you can have 5^31 < 2^72. The same as if you split them in enpassant (13) and non-enpassant(59), but without the extra complexity. \$\endgroup\$ Commented Jan 28, 2014 at 8:46
  • \$\begingroup\$ Doing that would actually use 1 extra bit. The reason is that there can (worst case) be 5 enpassant possibilities, but I still need to declare the possibility for "no enpassant", i.e. a 6th state. The 1 extra bit would in this case go to declaring enpassant possible or not (and with this approach I could as well use an even simpler approach with encoding the 30 pieces skipping the enpassant block, and use 3 bits seperately for enpassant check, which would also lead to +1 bit use). The following 5th row would enable 5 potential enpassants (White's turn): BWBBWBBW \$\endgroup\$
    – FIQ
    Commented Jan 28, 2014 at 9:29
  • \$\begingroup\$ yes, you're right. \$\endgroup\$ Commented Jan 28, 2014 at 10:42
7
\$\begingroup\$

166 bits

  • 1 bit: whose turn is it?
  • 2 bits: which castling options are open? (NB on close reading of the question, it's only necessary to record castling options for the player whose turn it is).
  • lg 6 ~= 2.585 bits: which en passant options are open? (See my other answer)
  • lg sum_{i=1}^{16} sum_{j=1}^{16} 64! / (i! j! (64-i-j)! = lg 3629590441720924477681996172 ~= 91.552 bits: which squares are occupied by men of which colour?
  • At worst lg 451366131803622235200 ~= 68.613 bits to indicate which men and in which order (see below)

Using arithmetic encoding (since at each step we're applying a uniform distribution) we can achieve ceil(3 + 2.585 + 91.552 + 68.613) = 166 bits.

The encoding for the men: given that we know how many men of a given colour there are, we can easily enumerate all possible distributions/multisets of men (e.g. with 5 men we might have one King, one Queen, two Rooks, and a Pawn) and then we can consider all possible permutations of each distribution.

However, we can do even better by taking into account interdependencies. I'm only doing this on a very basic level: how many possible promotions? A pawn can only become "passed" and able to promote in three ways: it captures and so moves into a different column; or its opposing pawn captures and so moves into a different column; or its opposing pawn is captured. Thus a capture for white potentially creates two passed pawns for white and one for black.

We can build up a partial table of the upper bounds on promotions:

(Max white promos, max black promos):

           White men
           16      15      14      13
Black men
       16  (0, 0)  (1, 2)  (2, 4)  (3, 6)
       15  (2, 1)  (3, 3)  (4, 5)  (5, 7)
       14  (4, 2)  (5, 4)  (6, 6)  (7, 8)
       13  (6, 3)  (7, 5)  (8, 7)  (8, 8)

We can also calculate the number of permutations given that a player has N men and no more than P promoted pawns:

Num of permutations (cumulative):
    max promotions: 0              1              2              3              4              5              6              7              8
 1 men              1              1              1              1              1              1              1              1              1
 2 men             10             10             10             10             10             10             10             10             10
 3 men             72             75             75             75             75             75             75             75             75
 4 men            436            496            500            500            500            500            500            500            500
 5 men           2305           3025           3120           3125           3125           3125           3125           3125           3125
 6 men          10746          17106          18606          18744          18750          18750          18750          18750          18750
 7 men          44170          88795         106260         109179         109368         109375         109375         109375         109375
 8 men         159832         415360         575240         619200         624744         624992         625000         625000         625000
 9 men         509841        1721961        2884815        3398769        3504735        3515301        3515616        3515625        3515625
10 men        1447200        6258240       13063080       17697780       19260180       19510320       19530840       19531230       19531240
11 men        3706065       20021265       52183395       85007571      102173181      106786581      107369592      107409918      107410281
12 men        8678340       57101220      183088620      364510476      509818716      570620556      584017632      585352152      585430164
13 men       18725850      146911050      567991710     1373480394     2297173164     2902775304     3107861328     3143928216     3146014014
14 men       36756720      339459120     1555313760     4501448952     9021804792    13325103792    15664512864    16283899632    16360920576
15 men       60810750      660810150     3555401850    12144582450    28834205400    50030580600    66655789200    73588394880    74576231730
16 men       64864800      843242400     5383778400    21810428640    61514893440   126475789440   196178062080   240747386880   253686232800

Combining the two, we can get the number of bits of required to specify both permutations given the numbers of men on both sides:

           White men
           16      15      14      13      <13
Black men
       16  51.902  61.626  66.375  67.868  <=67.009
       15  --      67.000  68.613  67.534  <=65.243
       14  --      --      67.734  65.480  <=63.055
       13  --      --      --      63.102  <=60.676

If it's not in this section of the table, we can just assume that both sides have up to 8 promotions and we're still doing better than the worst case, which is 68.613 bits when one has 14 men and the other has 15 men.

Note that this is still a long way from being a perfect representation, because it allows many many illegal positions.

Code for calculating the permutation table:

import java.util.*;

public class ChessCombinatorics {
    public static void main(String[] args) {
        long[] f = new long[17];
        f[0] = 1;
        for (int i = 1; i < 17; i++) f[i] = i * f[i-1];

        // Indexed by num promotions, then total num men.
        long[][] distribs = new long[9][17];
        long[][] perms = new long[9][17];

        for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
            Map<Integer, Map<String, Long>> numCases = new HashMap<Integer, Map<String, Long>>();
            for (int i = 1; i < 17; i++) numCases.put(i, new HashMap<String, Long>());

            for (int extraQ = 0; extraQ <= promotedPawns; extraQ++) {
                for (int extraR = 0; extraR + extraQ <= promotedPawns; extraR++) {
                    for (int extraN = 0; extraN + extraR + extraQ <= promotedPawns; extraN++) {
                        int extraB = promotedPawns - extraN - extraR - extraQ;
                        int unpromotedPawns = 8 - promotedPawns;

                        // Promoted pawns should only count towards their new type if the existing ones are alive.
                        // Otherwise we double-count some cases.
                        int minQ, maxQ, minR, maxR, minN, maxN, minB, maxB;
                        if (extraQ == 0) {minQ = 0; maxQ = 1;} else {minQ = maxQ = 1 + extraQ;}
                        if (extraR == 0) {minR = 0; maxR = 2;} else {minR = maxR = 2 + extraR;}
                        if (extraN == 0) {minN = 0; maxN = 2;} else {minN = maxN = 2 + extraN;}
                        if (extraB == 0) {minB = 0; maxB = 2;} else {minB = maxB = 2 + extraB;}

                        for (int numQ = minQ; numQ <= maxQ; numQ++) {
                            for (int numR = minR; numR <= maxR; numR++) {
                                for (int numN = minN; numN <= maxN; numN++) {
                                    for (int numB = minB; numB <= maxB; numB++) {
                                        for (int numP = 0; numP <= unpromotedPawns; numP++) {
                                            // The number of possibilities at these values is (numK + numQ + numR + numN + numB + numP)! / (numK! numQ! numR! numN! numB! numP!)
                                            numCases.get(1+numQ+numR+numN+numB+numP).put(numQ+","+numR+","+numN+","+numB+","+numP, f[1 + numQ + numR + numN + numB + numP] / f[numQ] / f[numR] / f[numN] / f[numB] / f[numP]);
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }

            for (int numMen = 1; numMen < 17; numMen++) {
                distribs[promotedPawns][numMen] = numCases.get(numMen).size();
                if (distribs[promotedPawns][numMen] > 0) {
                    for (Long l : numCases.get(numMen).values()) perms[promotedPawns][numMen] += l;
                }
            }
        }

        System.out.println("Num of permutations (cumulative):");
        System.out.println("    max promotions: 0              1              2              3              4              5              6              7              8");
        for (int numMen = 1; numMen < 17; numMen++) {
            System.out.print(String.format("%2d men", numMen));
            long cumul = 0;
            for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
                cumul += perms[promotedPawns][numMen];
                System.out.print(String.format("%15d", cumul));
            }
            System.out.println();
        }

        System.out.println("Entropy of permutations:");
        System.out.println("    max promotions: 0              1              2              3              4              5              6              7              8");
        for (int numMen = 1; numMen < 17; numMen++) {
            System.out.print(String.format("%2d men", numMen));
            long cumul = 0;
            for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
                cumul += perms[promotedPawns][numMen];
                System.out.print(String.format("  %6.3f", Math.log(cumul) / Math.log(2)));
            }
            System.out.println();
        }

    }
}
\$\endgroup\$
10
  • \$\begingroup\$ How do you deduce the positions of the kings? You use 15 men in your computation and no special bits for king positions. \$\endgroup\$ Commented Jan 28, 2014 at 21:43
  • \$\begingroup\$ @AlinStoian, oops. I had a < rather than <= in the output loop of my program. Thanks for pointing it out. I could still recover the previous score by special-casing all 32 men being on the board, but I won't do that right now. \$\endgroup\$ Commented Jan 28, 2014 at 22:42
  • \$\begingroup\$ Interesting data! Theoretical worst case with 3 men captured \$\endgroup\$ Commented Jan 31, 2014 at 15:40
  • \$\begingroup\$ @steveverrill, what I would really like to do is encode the pawn positions and number of promotions in one "block" and then the piece positions and values. However, there are at least 2^38 pawn positions without taking into account promotions, and enumerating them efficiently has so far escaped me. \$\endgroup\$ Commented Jan 31, 2014 at 15:45
  • \$\begingroup\$ @petertaylor If you only have 16 pawns on the board, restricted to 48 squares, you have 48!/32!/8!/8!= 29019905518636890 possibilities. Slightly more than 2^54! Of these, some are illegal, you can't have all the pawns of one colour on one side of the board. \$\endgroup\$ Commented Jan 31, 2014 at 16:11
5
\$\begingroup\$

178 bits (174 at a pinch!) worst case

Hi, just getting back into coding, which I haven't really done since college. I saw this site and thought this looked interesting. I did a little theroretical check and it appears at least 146 bits are needed for a perfect algorithm, probably quite a few more (I will explain in the comments when I have a moment.)

Anyway, this is the way I structure the data. The basic concept comes in at 178 bits, but with some jiggery pokery it can brought down to 174 (that's 21 3/4 bytes). 175 is slight easier to program, is more human readable, and still within 22 bytes.

A) Position of both kings: 6 bits each for white and black 12 BITS

B) Of the remaining 62 squares, which are occupied? A matrix of 62 BITS

C) Whose turn is it? 1 BIT

TOTAL SO FAR: 75 BITS

D) En Passant. If it is white’s turn to move, up to 5 black pawns may look like they could be captured En Passant. The black pawn has to be on row 5 (from bottom to top starting at zero), and have a white pawn next to it. One situation with maximum number of possible captures looks like this:

B W B B W B B W

If there were 6 black pawns on row 5, white would only have 2 squares to stand on and could only threaten 4 black pawns, so it is not possible to have more than 5 black pawns apparently under threat from En passant at the same time. So we need a number 1 to 5 indicating which of the (up to 5) pawns on row 5 that has a hostile (in this case white) pawn next to it was advanced 2 squares on the last turn (or, zero if no pawn in this situation was moved in this way on the last turn.)

E) Of the up to 30 occupied squares (not including kings) what do they contain?

There are 10 possibilities, each represented by a decimal number.

The least significant bit represents the colour.

Hence even numbers are white, odd numbers are black.

White/Black

Pawn 0/1 (or Rook that is allowed to castle*)

Knight 2/3

Bishop 4/5

Rook 6/7

Queen 8/9

*A rook that is allowed to castle (and therefore has never been moved from the first or last row) is represented by 0 or 1 instead of 6 or 7. It cannot be confused with a pawn, because pawns cannot be found on the first or last row.

This gives a decimal number of up to 30 digits, which we can multiply by 6 and then add the code for En passant. The resulting number will fit into 103 bits, which when added to the 75 mentioned above is 103+75=178 bits. Actually, if we simply multiply by 10 instead of 6, it makes no difference to the number of bits used, and decoding is easier.

This is just 2 bits more than 22 bytes. However we can push it down to 174 bits, as explained below.

If no piece has been captured, then it is impossible for a pawn to be promoted.

The proof is as follows. Imagine white is obsessed with promoting his pawn on (for example) column E, from the very start of the game. There is a black pawn opposite this pawn that is in the way. Therefore to promote this pawn, one of the following must happen:

1) The black pawn is captured.

2) The black pawn captures another piece and therefore moves out of the way.

3) the white pawn captures a pawn on an adjacent column such as column D.

4) the white pawn passes (or is passed by) a black pawn on an adjacent column and then captures a piece on that same adjacent column, causing the white pawn to change column.

Case 4 is the most interesting, because it is not just the white pawn that started on column E that now has a clear path to promotion. The black pawn on column that remains on column E can also promote. Therefore a single capture can clear the way for one pawn of each colour to promote.

Anyway, the fact that no pawn can promote until a piece is captured means that we do not have to store the 30th piece. We can work it out by elimination (or by subtraction, because the complete set of piece codes at the start of the game always adds up to the same amount =80.) One minor point is that we must ensure that the squares where the rooks stand at the start of the game are amongst the first scanned (because if they were last, we would not know if the rook could castle or not.) This is easily done by scanning row 0 and then rows 7 to 1: For r= 8 to 1 scan row [r mod 8].

So, the matrix of bits in (B) will tell us how many pieces there are (excluding kings.) If there are a full 30, ignore the last piece when encoding, the decoder will work out what it was. We now have an up to 29 digit decimal number, which we multiply by 6 and add to the En Passant code. The resulting number will just squeeze into 99 bits, giving a total of 99+75=174 bits.

As an example Here’s an actual position. White has just made his first move (advanced king’s pawn) and it is Black`s turn.

rnbqkbnr
pppppppp


    P

PPPP PPP
RNBQKBNR

A) Position of kings (White / Black in octal, 12 bits): 03 73 = 000011 111011

B) Which squares are occupied? Start with row zero (bottom row) then all other rows from top to bottom, skipping the kings:

1111 111

1111 111
11111111
00000000
00000000
00001000
00000000
11110111 

C) Black’s turn: Turn Bit = 1

D) En Passant. There is no white pawn next to next to a black pawn, therefore there is no pawn that can be taken en passant (even though this pawn did advance last move) so D=0. If, instead of considering only pawns which have a hostile pawn beside them, we consider all pawns that do not have friendly pieces beside them on both sides, then D would be 1, as there is one such pawn in this situation, and this particular pawn was indeed moved in the last turn.

E) Again, bottom row first, then all other rows from top to bottom, skipping the kings, and with the four uncastled rooks referred to as 0 or 1 (numbers normally reserved for pawns.)

RNBQ BNR =   0248 420
rnbq bnr =   1359 531
pppppppp =   11111111
PPPPPPPP = (0)0000000

The 30th digit (in brackets) can be discarded.

Though it is not very apparent here, the pawn that White has advanced is actually at one end of the list of pawns, because we scan row by row.

Our data now looks like this, with 29 codes for the content of squares, plus the En Passant code:

 (0 discarded) 0000000 11111111 1359531 0248420 (0 en passant)

It is best to scan from right to left when decoding and from left to right (reverse order) when encoding. This means that when there are fewer pieces we will have a smaller number, while retaining maximum consistency (i.e we want the blank space / zeroes to be leading, not trailing, to enable compression of sparsely occupied boards.) When we have only 2 kings on the board, we will have the 75 bits mentioned above, plus 3 bits to store en passant data = 78 bits in the best case. Each additional piece comes in slightly under 3.5 bits (2 pieces can be stored in 7 bits, because 100<128.)

There is a practical problem in that a 99 bit integer is too big to fit in a 64 bit integer variable, which means many programming languages do not provide support for it (you can't simply convert a string representation of a 29-30 digit number into an integer.) As an easy way of encoding for 22 bytes, we can break a 30 digit number (29 piece codes + en passant code) into two 15 digit numbers each of which will fit in 50 bits each (total 100 bits plus the 75 mentioned above makes 175 bits worst case.)

For maximum compression, as stated above, 29 decimal digits plus the En Passant code (6 possible values), will just about fit into 99 bits (for a total of 174 bits) but without support from the language for integers of this size it is complicated to program. It may be easier to separate out the 29 colour bits and work with the piece-type codes (5 possibilities) and En passant code (6 possibilities) separately from the colours (70 bits, almost fits into a 64 bit variable.)

\$\endgroup\$
1
  • \$\begingroup\$ Nice trick with the last man. \$\endgroup\$ Commented Jan 28, 2014 at 8:37
4
\$\begingroup\$

256 242 bits

Here's a basic compression algorithm that can probably be improved on because it doesn't exclude certain illegal positions from being represented.

The board starts with 5 bits of header info, as follows:

0 1 1 1 1
---------
1 2 3 4 5

1: Turn (black = 1, white = 0)
2: Black can castle queen-side
3: Black can castle king-side
4: White can castle queen-side
5: White can castle king-side

Then, a string of 12 bits representing the positions of the kings.

0 0 0 1 0 0 1 1 1 1 0 0
-----------------------
0 0 0 0 0 0 0 0 0 1 1 1
1 2 3 4 5 6 7 8 9 0 1 2

01 - 03: white king's rank
04 - 06: white king's file
07 - 09: white king's rank
10 - 12: white king's file

Then, a huge 64-digit number in base 11, which is then multiplied by 9 to add another digit at the end representing the state of en-passant. Each digit in base 11 represents a square on the board, with the following possible values:

0: empty

1: white pawn
2: white knight
3: white bishop
4: white rook
5: white queen

For the black equivalent of each white piece, add 5.

And the digits in base 9:

0: no en-passant possible
1 - 8: en-passant on rank 1 - 8

1164 × 9 is about 2224.57, which requires 225 bits to encode. Plus the 17 header bits at the top, is 242 bits total.


Thanks to ugoren for the improvements.

\$\endgroup\$
5
  • \$\begingroup\$ This is very similar to ace's algorithm, but it uses board positions instead of pieces in a predetermined order, which both excludes it from having the pawn promotion problem and allows the data to be chopped off a bit. \$\endgroup\$
    – Joe Z.
    Commented Jan 26, 2014 at 2:48
  • \$\begingroup\$ You can save en-passant as one number between 0 and 8 (on which column can the current player capture en-passant?). 13^64 * 9 is 239.99, saving 11 bits. Save more by encoding king positions separately. \$\endgroup\$
    – ugoren
    Commented Jan 26, 2014 at 13:15
  • \$\begingroup\$ According to Doorknob of Snow who commented on my post, this kind of answer is "not an answer". Just saying. FYI his comment was posted before I added the C code to my answer. \$\endgroup\$
    – user12205
    Commented Jan 26, 2014 at 16:03
  • \$\begingroup\$ @ugoren: I forgot about that. You're right, I forgot that only one pawn can be en-passant at the same time. \$\endgroup\$
    – Joe Z.
    Commented Jan 26, 2014 at 18:34
  • \$\begingroup\$ @ace: Your answer isn't invalid because it doesn't include encoding and decoding code; it's invalid because it doesn't account for the pawn-promotion case (in which case your predefined order of pieces doesn't do anything). The problem, at its core, asks for a data-encoding scheme. The program is just something to interface with that. \$\endgroup\$
    – Joe Z.
    Commented Jan 26, 2014 at 18:59
4
\$\begingroup\$

Here is a full solution, actual worst case 181 bits

The focus here is a simple program you can easily understand

Input is FEN, here is opening position, it has six fields (5 & 6 ignored):

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

First field (piece placement) is parsed

perl -pe 's/\d/"_"x$&/ge;s/\s.*//;s|/||g'

To produce:

rnbqkbnrpppppppp________________________________PPPPPPPPRNBQKBNR

Field one: encode the location of kings (12 bits):

printf("%b",index('k',$_))
printf("%b",index('K',$_))

Field two: encode pieces (up to 5 bits per piece):

s/_/0/g     Blank
s/P/100/g   From here, as normal chess meaning
s/p/101/g
s/Q/11000/g
s/q/11001/g
s/R/11010/g
s/r/11011/g
s/B/11100/g
s/b/11101/g
s/N/11110/g
s/n/11111/g
s/K//
s/k//

Field three: active color (1 bit)

s/w/0/
s/b/1/

Field four: castling availability (4 bits)

m/K/?1:0
m/k/?1:0
m/Q/?1:0
m/q/?1:0
  • FIQ shows this field could be removed altogether.

Field five: en passant (zero or 3 bits)

printf("%b",ord($1)-ord("a")) unless m/-/
// The EP's rank is 3 or 6 based on active color, only need to encode file

Naive worst case 200 bits

  • Two kings placements - 12 bits
  • Board
    • QRRBBNN QQQQQQQQ - 75 bits
    • qrrbbnn qqqqqqqq - 75 bits
    • Empty squares - 30 bits
  • Active color - 1 bit
  • Castling - 4 bits
  • En Passant - 3 bits

Actual worst case

Each player cannot promote all pawns without capturing other pieces. Here is the entropy effect of piece capture:

  • PpR (3+3+5 = 11 bits) => Qq_ (5+5+1 = 11 bits)
  • PPpp (3+3+3+3 = 12 bits) => QQq_ (5+5+5+1 = 16 bits)

So actually the worst case board is:

  • QRRBBNN QQQQQQQQ - 75 bits
  • qrrbbnn qqqq - 55 bits
  • Empty squares - 34 bits

Worst case is to promote all pieces rather than leave pawn for en passant.

TOTAL ACTUAL WORST CASE WITH SHOWN CODE 12+75+55+34+1+4 = 181 bits

FIQ shows two improvements to this simple scheme, but they are harder to code:

  • Remove bit 2 from piece encoding on rows 1 and 8 since pawns can't go there (up to 16 bit savings)
  • Use pawns to encode castable rooks (4 bit savings)

The only remaining code not shown in this answer (for brevity) is: breaking the input FEN in fields (split /\s/) and variable assignment.

\$\endgroup\$
1
  • \$\begingroup\$ @KrzysztofSzewczyk, yes that is noted above at PPpp => QQq_ \$\endgroup\$ Commented Oct 15, 2019 at 19:11
3
\$\begingroup\$

? bits

(≥ 217 worst-case, 17 best-case, 179 for initial board)


Encoding description

Extra metadata consists of whose turn it is (one bit) and castling (four bits, i.e. may white castle on kings' side? on queens' side? and similarly for black).

For the board position itself, we encode it as a set of active pieces. Well, actually, we make sure to enumerate the pieces in a particular order for reasons that I'll explain in a bit. For each piece we store its colour (one bit), its kind (three bits to encompass 6 kinds, plus one extra kind for "pawn that may be taken by en passant"), as well as its position.

Here's the interesting part: to encode the position of a piece, instead of storing it as a coordinate, we store its relative distance from the last piece when enumerating the pieces in left-to-right, top-to-down order (i.e. A8, B8, ..., G1, H1). In addition, we store the distance as a variable-length number, using 1 to mean that this piece is right next to the previous, 0xx for skipping 1-3 pieces, 000xxx for skipping 4-10 pieces, 000000xxxx for 11-25, 0000000000xxxxx for 26-56 and finally 000000000000000xxx for 57-62.

Examples

I made a gist of some positions coded with this encoding, and annotated the one for the initial position with some comments.

I don't know how to analyse the worst-case bit size, but going by the examples in the gist, I believe that the algorithm should be somewhat efficient at least.


Implementation of decoder

Below is a quick-and-dirty decoder for this encoding (taking as input the binary data encoded in text, as in the above annotated example, and skipping over things that aren't '0' or '1'). Produces a unicode chess board to stdout.

#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

char buf[1024];
int wi = 0, ri = 0;

int read_n(int n) {
  int res = 0;
  for (int i = 0; i < n; i++) {
    res = res << 1 | (buf[ri++] == '1');
  }
  return res;
}

int read_varnum() {
  int v, c = 0;

  for (int i = 1; i <= 5; i++) {
    v = read_n(i);
    if (v != 0) return c + v;
    c += (1 << i) - 1;
  }

  assert(false); /* Shouldn't happen */
}

char *piece_to_str(int piece, int color) {       /* ↓ pawn that may be taken with en passant */
  char *pieces[] = { "♙", "♘", "♗", "♖", "♕", "♔", "♙",
                     "♟", "♞", "♝", "♜", "♛", "♚", "♟" };
  return pieces[color * 7 + piece];
}

int main(void) {
  int ch;
  while (ch = getchar(), ch != EOF) {
    if (ch == '0' || ch == '1') buf[wi++] = ch;
  }

  int board[64];
  memset(board, -1, 64 * sizeof(int));

  /* Read metadata */
  int is_white = read_n(1);
  int castling = read_n(4);

  /* Read the board state */
  int bi = -1;
  while (ri != wi) {
    int color = read_n(1);
    int kind  = read_n(3);
    int delta = read_varnum();
    board[bi + delta] = color << 8 | kind;
    bi += delta;
  }

  /* Print metadata */
  printf("  Current turn: %s's turn to play.\n", is_white? "white" : "black");
  printf("  Castling: White may castle? %s %s\n",
         castling & 0x8? "left" : "", castling & 0x4? "right" : "");
  printf("            Black may castle? %s %s\n",
         castling & 0x2? "left" : "", castling & 0x1? "right" : "");
  printf("\n");

  /* Print the board out */
  printf("+");
  for (int x = 0; x < 8; x++) printf("--");
  printf("-+\n");

  for (int y = 0; y < 8; y++) {
    printf("|");
    for (int x = 0; x < 8; x++) {
      int piece = board[y*8 + x],
          color = piece >> 8,
          kind  = piece & 0xFF;

      if (piece == -1) printf("  ");
      else printf(" %s", piece_to_str(kind, color));
    }
    printf(" |\n");
  }

  printf("+");
  for (int x = 0; x < 8; x++) printf("--");
  printf("-+\n");

  return 0;
}
\$\endgroup\$
3
  • \$\begingroup\$ I have no idea what your worst-case bit count is, but I suspect it's considerably more than 179 bits. For example, how would your algorithm handle this layout? (It is valid, by the way; I made it using the editor at chess.com) \$\endgroup\$
    – r3mainer
    Commented Jan 26, 2014 at 21:16
  • \$\begingroup\$ @squeamishossifrage that one seems to require 217 bits: gist.github.com/FireyFly/8639791 (thanks for the test-case, I wanted to try it with a trickier one!) \$\endgroup\$
    – FireFly
    Commented Jan 26, 2014 at 21:37
  • \$\begingroup\$ Hey, that's not bad. Keep going! \$\endgroup\$
    – r3mainer
    Commented Jan 26, 2014 at 21:41
2
\$\begingroup\$

Max: 184 bits, Min: 75 bits

I was inspired by @AdamSpeight's Huffman coding for pieces to try crafting my own scheme. I doubt this will win, but it does have calculable limits.

This scheme treats the chess pieces as though there are 11 different types of pieces. I approximately followed the Huffman coding algorithm to group these classes by their frequencies and real types to generate the following encodings:

 Piece Class                | Frequency Per Team | Encoding
============================+====================+==========
 Pawn, normal               | 0 - 8              | 0
 Pawn, jumped two last turn | 0 - 1              | 1111
 Knight                     | 0 - 2              | 1000
 Bishop                     | 0 - 2              | 1001
 Queen-side rook, unmoved   | 0 - 1              | 10100
 Queen-side rook, moved     | 0 - 1              | 10101
 King-side rook, unmoved    | 0 - 1              | 10110
 King-side rook, moved      | 0 - 1              | 10111
 Queen                      | 0 - 1              | 1110
 King, unmoved              | 0 - 1              | 1100
 King, moved                | 0 - 1              | 1101

Each piece's code can be preceded by two bits to show to which team it belongs (10 for White, 11 for Black). 0 can be used to encode empty spaces. These ideas allow us to build a square-by-square encoding for the whole board using whatever procedure we want. I will assume the iteration order a1, b1, c1, ... f8, g8, h8. This means that just listing these codes as shown above encodes all the information except whose turn it is. A very simple chess engine can use the "classes" for the pawns, rooks, and kings to determine whether castling and en passant are legal. Furthermore, this scheme easily handles pawn promotions. If a player promotes a pawn to a rook, either the king or queen-side codes may be used so long as the "moved" variant is chosen.

Excepting pathological positions that I do not think could ever happen, the worst-case scenario for this encoding is when all pieces are still on the board and the previous player moved a pawn forward two spaces. As an example, the board below encodes 1. e4:

1101010010100010100110111010110010100110100010101101001001001100100100100000000000000101111000000000000000000011011011011011011011011011101001110001110011111101111001110011110001110110
===========================================================================
                              Black's move
1110100 111000 111001 111110 111100 111001 111000 1110110 | r n b q k b n r
    110    110    110    110    110    110    110     110 | p p p p p p p p
      0      0      0      0      0      0      0       0 | . . . . . . . .
      0      0      0      0      0      0      0       0 | . . . . . . . .
      0      0      0      0 101111      0      0       0 | . . . . P . . .
      0      0      0      0      0      0      0       0 | . . . . . . . .
    100    100    100    110      0    100    100     100 | P P P P . P P P
1010100 101000 101001 101110 101100 101001 101000 1010110 | R N B Q K B N R

This means that the worst-case encoding has 184 bits: 1 for indicating the player, 32 for the empty spaces, 45 for the unmoved pawns, 6 for the two-space-jumping pawn, 24 for the knights, 24 for the bishops, 28 for the rooks, 12 for the queens, and 12 for the kings.

As pieces moved about and are captured, the size of the encoding will drop. The best case scenario is represented by two kings alone on the board: 1 bit for indicating the player + 62 bits for indicating the empty squares + 12 bits for indicating the kings gives a minimum length of 75 bits.

I will come back and edit this response with some code that demonstrates this encoding in action later today or tomorrow. For now, I am curious to see what other people think of this encoding.

\$\endgroup\$
3
  • 1
    \$\begingroup\$ You can save bits on the rooks. You can determine queen- and king-side based on position, and you don't need to know which side a moved-rook came from. so you just need one bit of information on the rook - moved or unmoved. \$\endgroup\$ Commented Jul 8, 2015 at 15:35
  • \$\begingroup\$ ... And actually, storing that data on a piece level is easier to encode/decode, but if you just stored a marker at the start, you may save bits overall worst case (e.g., the marker 11001 means B'S MOVE W CAN KSC W CANT QSC B CANT KSC B CAN QSC). That is 4 bits instead of the 5 bits per side in your encoding, or 3 bits per side if you eliminate the side marker on the rooks. (KSC = King's-side castle. QSC = Queen's-side castle) \$\endgroup\$ Commented Jul 8, 2015 at 15:40
  • \$\begingroup\$ Also, due to promotions, it is quite possible to have up to 9 Queens or 10 Knights (or any other non-regal, non-pawn piece) \$\endgroup\$ Commented Jul 8, 2015 at 15:43
2
\$\begingroup\$

152.6 bits

= log N where N = 8726713169886222032347729969256422370854716254 is the size of a ranking of chess positions including all legal ones. This is the best we can do since N is the best known upper bound on the number of legal positions. See https://github.com/tromp/ChessPositionRanking for more details.

\$\endgroup\$
1
\$\begingroup\$

184 bits = 23 bytes worst case, and not too complicated:

A. Which squares occupied: 64 bits = 8 bytes B. Which colors for the <=32 occupied squares: 32 bits = 4 bytes And now using only the <=30 squares occupied by non-kings: B. use PNBRQ encoded in radix 5, for 5^30 possibilities; and 32*31*2*4*4*9 for king positions, mover-color, white & black castling rights, en passant square (among 8 possibilities plus none, a 9th); this number 5^30 * 32 * 31 * 2 * 4*4*9 = 266075134277343750000000000 = 2^87.782 fits in 88 bits for a total of 64+32+88=184 bits for the whole encoding.

This can be reduced, e.g. 32*31*2*4*4*9=285696 can be reduced to 2*(32*31+31*3+31*3+3*3)*6=14244, by using fact at most 6 en passant victim-candidates (including none), and encoding castling rights and king positions inside same set using fact castling rights only matter when king on initial square. This saves 4 bits.

I have reason to believe that it is not possible to achieve 18 bytes, i.e. the total number of legal chess positions exceeds 2^144.

Your 160-bit scheme is very ingenious but I think it needs to be given as encode/decode programs before I'll be completely confident about it.

\$\endgroup\$
0
1
\$\begingroup\$

This is a discussion topic in Chess circles.

Here is one very simple proof with 164 bits https://groups.google.com/forum/#!topic/rec.games.chess.computer/vmvI0ePH2kI 155 is shown here http://homepages.cwi.nl/~tromp/chess/chess.html

Over simplified strategy:

  • Limit pawns to the place where pawns may be found
  • Limit armies to consider original pieces and possible pawn promotions
  • Think hard about promotions and situations where promotions are not possible
\$\endgroup\$
3
  • 2
    \$\begingroup\$ Link-only answers are not good answers. You should provide the complete answer within your post, not just some links to the answer. And if your answer is not your own work you should probably make your post a community wiki. \$\endgroup\$
    – user12205
    Commented Apr 19, 2014 at 18:56
  • \$\begingroup\$ Post is original. Adding details to answer. \$\endgroup\$ Commented Apr 23, 2014 at 15:38
  • 1
    \$\begingroup\$ This is an example of why you include the content in your answer. The 2nd link is dead. Is this it? tromp.github.io/chess/chess.html \$\endgroup\$
    – mbomb007
    Commented Oct 4, 2016 at 19:54
1
\$\begingroup\$

171 bits worst case:

I've combined a couple of ideas, as well as some of my own thoughts.

We are going to start with a 64 bit board. Each bit represents an occupied space on the board. They fill along rows. So the start looks like: 1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111

Now, each piece will be represented by 4 bits. 1st bit: color (0=white, 1=black) 2nd-4th bits: One of 8 types.

0=king, 1=queen, 2=bishop0, 3=knight, 4=rook, 5=pawn0, 6=pawn1, 7=bishop1

At the end we will include a bit designating the turn. 0=white, 1=black.

4bits*32 pieces=128 bits and I've already got 64+1 from turn and board. That gives a total of 128+64+1=193 I haven't even started with en passant or castling. Well over my limit with nothing -- not even turns. This is where the tricks start.

Okay -- you see those types above? Bishop0 and Bishop1? Pawn0 and pawn1? Those types are so designated, because they tell us a bit to insert after this piece. So Bishop0 means that after it, there will be a 0 -- i.e that the next piece is white. Bishop1 tells us the next piece is black, and the pawn0 and pawn1 work the same. (If this piece is the last piece being enumerated, then it tells us about the next turn instead).

My worst case involves all the pieces on the board, so with 16 pawns and 4 bishops, this saves me 20 bits. I'm down to 173.

Okay. For another bit in my worst case -- once there are 16 of a color encoded, we stop encoding color -- as we know it going forwards. My worst case now involves a white piece making it to the far corner with no captures. There, I save only one bit. 172.

I'm going to now change the order in which I name the pieces. We will name them along columns starting at the outside moving in. The board named at the beginning will stay the same, but when I place pieces on it, i start from whites bottom right, and go up that column. Then I jump to black's bottom right, and go up that column. I jump to white's bottom right unknown cell, and so forth -- this means my worst case is again the start. My reason has to do with my en passant trick, the next two bits I lose, and castling.

Now, for a pawn to be promoted (as has been discussed at length) a piece must be captured. Thus, when we know there are 32 pieces, we only need to denote 31 of them. The last piece is uniquely identified. As it turns out, for me, this only saves 2 bits -- because my last piece might be a pawn/bishop (which normally costs me 3 bits because I save one on the next piece) who's color is determined already and so was only 2 bits. I'm down to 170 bits.

When pawns get promoted, they simply change their type. For each piece that goes off the board I rid myself of (minimum) 3 bits, and two pawn promotions cost me 2 bits, so I'm declining (slowly) in promotions.

Castling. For castling to happen, neither king nor relevant rook may have moved. Thus, when a rook is capable of castling both it and the king will be in their original places. So, rooks capable of castling will be listed in the same place as kings of that color. As this is illegal (two kings or three kings of the same color on the board) and only can happen in three possible setups for each color (left, right, both rooks are listed as kings) -- decoding is utterly possible. So, for no bits we have encoded castling.

En Passant Here we will also use illegal positions. Only one pawn can be in danger of en passant at a time. It must be in its fourth row. The pawn that is vulnerable will be 'flipped' back to its home row -- switched with whatever is there. As that pawn is in its own first row -- an illegal position, the situation will be obvious to the decoder -- it will reverse the positions, and mark the pawn as vulnerable to en passant.

We needed to muck about with the order because the last piece has to be 'uniquely identified'. In a standard order, we wouldn't be able to tell if the rook in the back corner could castle -- that's not known. When we work from the outside in, we guarantee that wherever that rook is 'listed' be it in its own corner, or in the middle of the board because it was switched with an en passant vulnerable pawn, there will be a piece listed after it -- so we are told the rook's type. We know there will be a piece after it because, for a pawn to be vulnerable there must be a pawn to its inside (which will crucially be encoded later as per the instructions above).

Oh, and to help make sure that the worst case involves all pieces on the board, once we have less than 32 pieces, we can use the 64 bit board to identify turn as well as occupied positions by using 0's to represent pieces when its white's turn and 1's when it is blacks turn.

So we got en passant and castling for free. We picked up the last piece for free, though it took some finagling with order to make that play nice with the en passant and castling rules. We ditched 20 bits on the standard pieces. I believe the worst case here involves a midde white pawn moved forward and a black piece in between it and its queen while all pieces are on the board. Quick double check: first piece is captured -- call it a pawn, 3 bits off the board in the pawn, 3 bits on the board in the form of a last piece, one bit off in the turn marker disappearing. Promote two pawns 2 bits back on the board. Ah damn, I'm at 171.

EDIT I've added code for a (working?) decoder -- in R -- below. It takes vectors of booleans as input -- (sorry -- I'm not capable of coding well in anything that would let me actually use the bits) I've also included the start position.

separate = function(vec){
    #Using a boolean vector (sorry R doesn't handle bits well and this will build quickest)
    board = matrix(vec[1:64],8,8,byrow=T)
    npieces = min(sum(board),64-sum(board))
    n = length(vec)
    a = vec[65:n]
    counter = 0
    pieces = list()
    white = 0
    Letters=c(letters,LETTERS)
    for (i in 1:npieces){
        col = ifelse(a[1],"black",{white=white+1;"white"})
        typ = a[2:4]
        a=a[-(1:4)]
        num = 4*typ[1] + 2*typ[2] + typ[3]
        type = switch(letters[num+1],a="king",b="queen",c="knight",d="rook",e="bishop0",f="bishop1",g="pawn0",h="pawn1")
        if (num > 3) {
            if(num%%2){
                a = c(T,a)
            } else {
                a = c(F,a)
            }
            type = substr(type,1,nchar(type)-1)
        }
        pieces[[Letters[i]]] = list(color=col,type=type)
        if (length(pieces)==31&&npieces==32) {
            col = ifelse(16-white,{white=white+1;"white"},"black")
            type = "TBD"
            pieces[[Letters[i+1]]] = list(color=col,type=type)
            break
        }
    }

    if (npieces==32) {
        f=function(y){sum(sapply(pieces,function(x)x$type==y))}
        if (f("pawn")<16) {pieces[[32]]$type="pawn"}
        if (f("bishop")<4) {pieces[[32]]$type="bishop"}
        if (f("knight")<4) {pieces[[32]]$type="knight"}
        if (f("queen")<2)  {pieces[[32]]$type="queen"}
        if (f("king")<2)   {pieces[[32]]$type="king"}
        if (f("rook")<(6-f("king"))) {pieces[[32]]$type="rook"}
    }
    return(list(board,pieces,turn=ifelse(a[length(a)],"black","white")))
}


fillboard = function(out) {
    board = out[[1]]
    pieces = out[[2]]
    turn = out[[3]]
    lpieces = lapply(pieces,function(x) paste(substr(x$color,1,1),x$type))
    game = matrix("     ",8,8)
    #Start with corners.
    a = c(1,57,8,64)
    #Then kings
    b = c(25,32)
    #Then rooks in en passant
    c = c(4,60,5,61)
    #Then kings in en passant
    d = 28:29
    exceptions = list(a,b,c,d)
    for (places in exceptions) {
        c= which(board[places])
        if (length(c)) {
            repl = lpieces[1:length(c)]
            game[places[c]] = unlist(repl)
            board[places] = F
            lpieces = lpieces[-(1:length(c))]
        }
    }
    #Loop through rows.
    for (i in c(1:4,8:5)) {
        j = which(board[i,])
        if (length(j)) {
            repl = lpieces[1:length(j)]
            game[i,j] = unlist(repl)
            board[i,j] = F
            lpieces = lpieces[-(1:length(j))]
        }
    }
    return(matrix(unlist(game),8,8,F))
}

swapillegal = function(matr) {
    mat = matr
    if (any(mat[8,]=="b pawn")) {
        j = which(mat[8,]=="b pawn")
        mat[8,j] = mat[5,j]
        mat[5,j] = "b pawn-e"
    }
    if (any(mat[1,]=="w pawn")) {
        j = which(mat[1,]=="w pawn")
        mat[1,j] = mat[4,j]
        mat[4,j] = "w pawn-e"
    }
    
    if (sum(mat[8,]=="b king") > 1) {
        j = which(mat[8,-4]=="b king")
        j[j==7] = 8
        mat[8,j] = "b rook-c"
    }
    if (sum(mat[1,]=="w king") >1) {
        j = which(mat[1,-4]=="w king")
        j[j==7] = 8
        mat[1,j] = "w rook-c"
    }
    return(mat)
}

decode = function(vec) {
    a = separate(vec)
    b = fillboard(a)
    c = swapillegal(b)
    list(board=c,turn=a[[3]])
}


startboard = c(rep(T,16),rep(F,32),rep(T,16))
#Re-ordering -- first spots will be the corners. Then kings. then en passant positions of those spots
pieces = c(F,F,T,T,F,F,T,T,T,F,T,T,T,F,T,T,F,F,F,F,T,F,F,F,F,F,T,F,F,T,F,F,F,F,T,F,T,F,F,F,T,F,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,T,F,T,F,T,T,F,T,F,F,T,T,T,F,T,F,T,F,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F)
########## w rook -w rook -B rook -B rook -W king -B king -w kni  -w bi 0 -w Q  -w Bi 0 -w kni-w p0   - 2   -   3 -  4  - 5   -  6  -  7  -w p1 -b kni-b bi1  -b q  -b bi1  -b kni-b p1   -2    - 3   - 4   - 5   - 6   - b p0- implicit b p0.
#After the kings come all the prior en passant positions. which are empty at start. Then we start at whites bottom right, and move across rows to middle. Then we go to blacks bottom left, and fill across to the middle.
#Changing start to properly encode rooks for castling
newpieces= c(F,F,F,F,F,F,F,F,T,F,F,F,T,F,F,F ,pieces[-(1:16)])
test2 = decode(c(startboard,newpieces))

This code builds 4 functions. One that does some basic separation of the pieces and the board structure, as well as figuring out piece type and any 'implicit' pieces. The next function fills in the board structure with those pieces in a slightly bizzarre (and different from my initial algorithm's) order [explained in code comments]. The next function takes the filled in board and detects any illegal positions -- it then fixes them and renames pawns that are vulnerable to en passant "x pawn-e" and any rooks that can castle "x rook-c". The final function is a wrapper that runs those functions in order and gives an output which is the current board as well as the turn.

I've also included the encoding of the start position (though to see it you will have to call c(startboard,newpieces) and the code calls the wrapper function on that position.

\$\endgroup\$
2
  • \$\begingroup\$ This is interesting. I'd love to see a working implementation as a proof of concept. \$\endgroup\$
    – user45941
    Commented Aug 9, 2016 at 8:59
  • \$\begingroup\$ Implementation is usually a few steps beyond proof of concept, but I've added a decoder. It is in R and thus uses Booleans rather than bits (sorry -- didn't want to use too much time). But I believe it should be proof of concept. \$\endgroup\$ Commented Aug 9, 2016 at 19:41
1
\$\begingroup\$

Huffman Pawn Swap

These are the stats for this encoding:

Initial: 129 bits
Min: 75 bits
Max: 165 bits
Average: ....? 100-140 bits?

Most approaches described here are pretty similar, some try to push the limit and this might result in slower encoding/decoding, I've tried to keep my method as simple as possible.

This originated after discussions with a colleague about creating a huge database with positions (that link to games and next positions).

We came up with the following method:

Encode who's turn it is

Use 1 bit to encode if white or black is to move

List all positions/squares in order

Next we list all the positions and which piece occupies it, using a rather simple Huffman-like coding:

0XXXXX Empty square (32 times, 1 bit, total: 32 bits)
1CXXXX Black or white color bit
1C0XXX Pawn (16 pieces, 3 bits, total: 48 bits)
1C100X Bishop (4 pieces, 5 bits, total: 20 bits)
1C101X Rook (4 pieces, 5 bits, total: 20 bits)
1C110X Knight (4 pieces, 5 bits, total: 20 bits)
1C1110 King (2 pieces, 6 bits, total: 12 bits)
1C1111 Queen (2 pieces, 6 bits, total: 12 bits)

This gives a maximum of 32+48+20+20+20+12+12 = 164 bits for all the pieces and positions.

En passant and castling

Like described in other answers, you can encode en-passant and castling by creating illegal positions using the encoding mentioned above. This is done by swapping/placing pawns on places they shouldn't be.

For en-passant you can swap that last moved pawn with the piece (or empty spot) from the fourth row to the first row. This is a place the pawn can never reach during a game, so swapping back and marking as en-passant is a nice trick.

A final trick we do, which saves a lot of bits, is to replace all pieces on your first row that haven't moved yet with pawns of the opposite color. This is useful because it saves a lot of bits, and also automatically encodes castling options.

Average size

It's hard to give an average without just running it against a lot of matches. Which I haven't done (yet).

The initial position is encoded in 129 bits (all pawns) and this number grows a little once pieces start to leave their initial position, but it never grows to more than 165 bits. Finally with just two kings on the board, it can go down to a minimum of 75 bits.

Easy decoding

Decoding is relatively easy, read one bit to see whos turn it is. Next enumerate all the positions and get a full chessboard returned.

If the board has an own pawn on the first row, flip with the fourth row and mark that pawn as en-passant.

Next look at all the opponent pawns, if they are on your first row, replace them with the original chess pieces and notice your rooks haven't moved yet.

\$\endgroup\$
1
  • \$\begingroup\$ How about promotions causing more queens? Queens increase encoding length vs pawns. On the other hand, it is impossible to promote without taking at least a piece off the board. On the other other hand, a single pawn-takes-pawn allows for three promotions. \$\endgroup\$
    – tomosius
    Commented Nov 9, 2023 at 6:57
0
\$\begingroup\$

229 / 226 bits

This turns out not to be very successful, but might save other people going down the same path.

The simple version:

  • 1 bit for whose turn it is
  • 4 bits for the four castling possibilities
  • 3 bits for the en passant possibilities. This has more depth that I at first understood. En passant must be done by a pawn moving from the same rank (row) as the pawn which is captured. Case analysis indicates that once we know how many pawns of the colour which last moved have advanced exactly two squares, there will be at most 6 en passant cases (including the case that there is no pawn vulnerable to en passant). The worse case is 5 pawns (potentially all vulnerable: e.g. PpPPpPPp has five vulnerable Ps). With 6 pawns there are at most 2 enemy pawns in the same rank, each of which can threaten at most 2 pawns en passant. Therefore we need ceil(lg 6) = 3 bits here.

Then the board. The board has 64 squares, so a square index can be encoded in 6 bits. We list the men by rank, alternating colours, starting with the king.

  • 6 bits: position of white king. (Guaranteed to be on the board).
  • 6 bits: position of black king. (Guaranteed to be on the board. In the worst case that the white king is in a corner, there are 60 possible places he could be; in the best case that the white isn't on an edge, there are 55).
  • 6 bits: position of white queen. If there is no white queen, repeat the white king's position as a signal.
  • For each additional white queen, a 1 bit followed by 6 bits for the position.
  • A 0 bit.
  • Ditto for black queen(s).
  • Similar process for rooks, bishops, knights, and pawns, although we can skip the pawns for a colour if we already have 16 men of that colour accounted for.
  • Delete the final 0 bit.

This costs a certain 12 bits for the kings, and 2*7*5-1 = 69 bits for the other men. In the worst case that all 32 men are on the board, it costs 7 bits for each man other than the kings, for a total of 12 + 7*30 - 1 = 221 bits. So with the initial 8 bits for global state we have 229 bits.


The advanced version:

By using arithmetic coding we can operate with lg num_possibilities rather than ceil(lg num_possibilities) and just take one ceil at the end.

  • 1 bit for whose turn it is
  • 4 bits for the four castling possibilities
  • lg 6 bits for the en passant possibilities.
  • 6 bits for the white king
  • lg 60 bits (worst case) for the black king
  • lg 63 bits (because I don't want to complicate it to the level of looking at checks) for the white queen, using the white king's position if there is none
  • For each additional white queen, a 1 bit followed by lg 62, lg 61, etc. bits for her position.
  • A 0 bit.
  • lg 63 bits (or fewer, if there were any white queens) for the black queen.
  • etc.

The nth man who's actually present has 66-n possible values. If a type is absent for a colour, we spent 66-#men so far bits in recording that (plus one bit for the separator). The extreme cases are:

  1. All men present, including at least one unpromoted pawn from each side. We spend 5+lg 6 on global state, 6+lg 60 on the kings, 29 on separator bits, and SUM_{n=32}^{63} lg n bits on positions. Grand total: ceil(225.82) bits. Disappointing.
  2. Only unpromoted pawns left. We spend 5+lg 6 on global state, 6+lg 60 on the kings, 29 on separator bits, 8*lg 63 saying that there are no other pieces, and SUM_{n=48}^{63} lg n on positions of the pawns. Grand total: ceil(188.94) bits. Better - by saving the second rook, knight, and bishop for each side we've pulled ahead a bit.

So the worst case seems likely to be 226 bits, for a measly saving of 3.

We can definitely do better in the average case by encoding pawns before pieces, since they're restricted to 48 squares rather than the full 64. However, in the worst case that all men are on the board and all pawns have been promoted I think this would end up costing 2 bits more because we would need a "no pawns" flag rather than being able to count the men.

\$\endgroup\$
-3
\$\begingroup\$

Min: 0 bits

Max: 1734 243 bits (4.335 4.401 bits/board amortized)

Expected: 351 177 bits (4.376 4.430 bits/board amortized)

Since I can determine the input and output however I want I decided to go with encoding the history of the game up until this point. One advantage is that the additional information of who's turn it is, en-passant, and who has the ability to castle where can be derived and not encoded.

Attempt 1:

Naively I thought that I can encode each move in 12 bits, 4 triplets of the form (start x, start y, end x, end y) where each is 3 bits.

We would assume the starting position and move pieces from there with white going first. The board is arranged such that ( 0 , 0 ) is white's lower left corner.

For example the game:

  e4    e5
 Nf3    f6
Nxe5  fxe5
...    ...

Would be encoded as:

100001 100010 100110 100100
110000 101010 101110 101101
101010 100100 101101 100100
...

This leads to an encoding of 12 m bits where m is the number of moves made

On one hand this could get really big, on the other hand you can consider each move to be it's own game so each encoding really encodes m "chess boards". If you amortized this you get that each "chess board" is 12 bits. But I feel this is a bit cheating...

Attempt 2:

I realized that each move in the previous attempt encodes many illegal moves. So I decided to only encode legal moves. We enumerate possible moves as follows, number each square such that ( 0 , 0 ) → 0 , ( 1 , 0 ) → 1 , ( x , y ) → x + 8 y . Iterate through the tiles and check if a piece is there and if it can move. If so add the positions it can go to to a list. Choose the list index that is the move you want to make. Add that number to the running total of moves weighted by 1 plus the number of possible moves.

Example as above: From the starting position the first piece that can move is the knight on square 1, it can move to square 16 or 18, so add those to the list [(1,16),(1,18)]. Next is the knight on square 6, add it's moves. Overall we get:

[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 12 , 28 ) , we encode this as 13 in base 20 since there are 20 possible moves.

So now we get the game number g0 = 13

Next we do the same for black except we number the tiles in reverse (to make it easier, not required) to get the list of moves:

[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 11 , 27 ) , we encode this as 11 in base 20 since there are 20 possible moves.

So now we get the game number g1 = ( 11 ⋅ 20 ) + 13 = 233

Next we get the following list of moves for white:

[(1,16),(1,18),(3,12),(3,21),(3,30),(3,39),(4,12),(5,12),(5,19),(5,26),(5,33),(5,40),(6,12),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27)(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 6 , 21 ) , we encode this as 13 in base 29 since there are 29 possible moves.

So now we get the game number g2 = ( ( 13 ⋅ 20 ) + 11 ) 20 + 13 = 5433

Next we get the following list of moves for black: [(1,11),(1,16),(1,18),(2,11),(2,20),(2,29),(2,38),(2,47),(3,11),(4,11),(4,18),(4,25),(4,32),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move $(10, 18)$ ( 10 , 18 )

So now we get the game number g3 = ( ( ( 19 ⋅ 29 + 13 ) 20 ) + 11 ) 20 + 13 = 225833

And continue this process for all the remaining moves. You can think of g as the function g ( x , y , z ) = x y + z . Thus g0 = g ( 1 , 1 , 13 ) , g1 = g ( g ( 1 , 1 , 11 ) , 20 , 13 ) , g2 = g ( g ( g ( 1 , 1 , 13 ) , 20 , 11 ) , 20 , 13 ) , g3 = g ( g ( g ( g ( 1 , 1 , 19 ) , 29 , 13 ) , 20 , 11 ) , 20 , 13 )

To decode a game number g0, we start at the initial position and enumerate all possible moves. Then we compute g1 = g0 // l, m0 = g0 % l, where l is the number of possible moves, '//' is the integer division operator and '%' is the modulus operator. It should hold that g0 = g1 + m0. Next we make the move m0 and repeat.

From the example above if g0 = 225833 then g1 = 225833 // 20 = 11291 and m0 = 225833 % 20= 13. Next g2 = 11291 // 20 = 564 and m1 = 11291 % 20 = 11. Then g3 = 11291 // 20 = 564 and m2 = 11291 % 20 = 11. Therefore g4 = 564 // 29 = 19 and_m_3 = 564 % 29 = 13. Finally g5= 19 // 29 = 0 and m4 = 19 % 29 = 19.

So how many bits are used to encode a game this way?

For simplicity, let's say there are always 20 moves each turn and for the worst case scenario we always pick the biggest one, 19. The number we will get is 19 ⋅ 20m

+ 19 ⋅ 20m-1 + 19 ⋅ 20m-2 + ⋯ + 19 ⋅ 20 + 19 = 20m+1 − 1 where _m is the number of moves. To encode 20m+1 − 1 we need about log2 ( 20m+1 ) bits which is about ( m + 1 ) ∗ log2 ( 20 ) = 4.3219 ∗ ( m + 1 )

On average m = 80 (40 moves per player) so this would take 351 bits to encode. If we were recording many games we would need a universal encoding since we don't know how many bits each number will need

Worst case when m = 400 (200 moves per player) so this would take 1734 bits to encode.

Note that the position we want to encode must be given to us via the shortest path to get there by following the rules. For example the game theorized here doesn't need m = 11741 to encode the final position. Instead we run a Breadth-First search to find the shortest path to that position and encode that instead. I don't know how deep we would need to go to enumerate all chess positions, but I suspect that 400 is an overestimate.

Quick calculation:

There are 12 unique pieces or the square can be empty so to position them on a chessboard is 1364. This is a vast overestimate since it includes many invalid positions. When we are m moves into the game we have created about 20m positions. So we are looking for when 20m = 1364. Log both sides to get m = 64 * log20(13) = 54.797. This shows that we should be able to get to any position in 55 moves.

Now that I calculated the worst case to be m = 55 not m = 400 I will edit my results. To encode a position where m = 55 takes 243 bits. I'm going to also say that the average case is around m = 40 which takes 177 bits to encode.

If we use the amortization argument from earlier we are encoding 400 "chess boards" in 1734 bits so we get that each "chess board" takes up 4.335 bits in the worst case.

Note that g = 0 denotes a valid game, the one where the piece on the lowest square moves to the lowest square it can.

Additional Notes:

If you want to refer to a specific position in the game you may need to encode the index. This can be added either manually e.g concatenate the index to the game, or add an additional "end" move as the last possible move each turn. This can now account for players conceding, or 2 in a row to denote the players agreed to a draw. This is only necessary if the game did not end in a checkmate or stalemate based on the position, in this case it is implied. In this case it brings the number of bits needed on average to 356 and in the worst case 1762.

\$\endgroup\$
7
  • 2
    \$\begingroup\$ Your “amortization” argument does not work. The goal is to encode a board given by the user. You do not get to divide the cost of encoding this board among the 400 auxiliary boards you might happen to generate along the way. (This would only be okay if those 400 boards were allowed to be independently chosen by the user, and not constrained by the requirement to form a chess game.) Also, a chess game can theoretically take many thousands of moves, and the OP is clear about being interested in the worst case. \$\endgroup\$ Commented Jul 16, 2017 at 18:32
  • \$\begingroup\$ @AndersKaseorg: Very true. It really depends on the objective. If you are trying to store an entire game with all of the other algorithms it would take m*c bytes where m is the number of moves and c is the size of their output. So if you are trying to store an entire 80 move game using the 160 bit solution it would take 12800 bits while mine would only take 351. For the sake of the competition I admit that you are correct, but I thought it was a nice property to point out since it is very common to want to store entire games and not just boards. \$\endgroup\$
    – edggy
    Commented Jul 16, 2017 at 18:44
  • \$\begingroup\$ The objective is unambiguous. “The goal is to make the smallest representation of a chessboard that could be used (once decoded) to determine all movement possibilities for a player on that turn. … Your score is determined worst-case scenario - maximum possible size in bits.” \$\endgroup\$ Commented Jul 16, 2017 at 18:49
  • \$\begingroup\$ @AndersKaseorg: I never claimed it was ambiguous. I just decided to take a different approach. One thing to note is in order to find the smallest board using my algorithm I need the smallest path to get to this position. For example in the 11741 turn game that you linked to get to the same board position I don't need to follow that path if all we care about is the board. So in order to encode the linked game I just find the shortest game which is left with 2 kings on those squares which may only be 200 turns or less. This can be done with a depth-first search. \$\endgroup\$
    – edggy
    Commented Jul 16, 2017 at 19:01
  • \$\begingroup\$ Using a shorter equivalent game is fine, if you can actually prove that every position is reachable in 200 turns or less (right now this looks like just a guess). You’ll also need to account for chess positions with more than 100 legal moves, unless you can prove that they don’t arise within your construction. Dividing your result by the number of moves in the game is still not allowed by the rules of this challenge. \$\endgroup\$ Commented Jul 16, 2017 at 19:56

Not the answer you're looking for? Browse other questions tagged or ask your own question.