2
$\begingroup$

Bacon's cipher is a classic steganographic method, but to decode it, you need to know which symbol is A, which symbol is B and which direction to read them in. (That said, you could just try all the possibilities, and all but one would likely come out as gibberish.) But what if you wanted a code that worked unambiguously without needing to identify which is the A symbol and which is the B symbol?

How long would a fixed length code need to be to encode a 26 character alphabet if -

  • codewords can't collide when the symbols are inverted? (That is, if ABAAB is a codeword, BABBA isn't. Alternatively, ABAAB and BABBA could encode the same character.)
  • codewords can't collide when read backwards or when the symbols are inverted? (That is, if ABAAB is a codeword, none of BAABA, BABBA or ABBAB are.)
  • codewords can't collide if placed on an n-by-n grid and the grid is transposed or the symbols are inverted?
  • codewords can't collide if placed on an n-by-n grid and read sideways, backwards, transposed, inverted, etc.?

For an example for the second two, assume ABAAB and ABBBB are a codewords, then they could be put in to a 5-by-5 grid:

ABAAB
ABBBB
ABAAB
ABBBB
ABBBB

This grid could either be read left-to-right then top-to-bottom or top-to-bottom then left-to-right in the third question (and the symbols might or might not be inverted). The codewords must be such that every combination of valid codewords in one direction will create at least one invalid codeword in the other direction so that someone with the codebook can definitely tell how it should be read. For example, perhaps ABABB isn't a valid codeword. (If this holds true for an n-by-n grid, then it should also hold true for an xn-by-yn grid.)

The same is true for the fourth question, but the correct direction could be any of the eight possibilities. The alternate interpretation where colliding codewords represent the same character doesn't really make sense for these questions.

I believe that if it were required that no valid codewords appeared in the other directions, both the last two questions would be impossible, but it'd be more interesting if I were wrong about that.

(Note: I don't have a definite answer to the last two of these.)

$\endgroup$
2
  • $\begingroup$ Can you give examples of what a collision would look like in the last two cases, like you did for the first two? $\endgroup$ Commented Apr 29, 2015 at 3:18
  • $\begingroup$ I'm not sure I can give a succinct example for the second two, but I'll try. $\endgroup$ Commented Apr 29, 2015 at 3:23

3 Answers 3

3
$\begingroup$

First case: you need a length of $6$. Second case: You need a length of $7$.

For the first case, suppose we want a length $n$ code. There are $2^n$ strings of $n$ symbols, but these can be divided into complementary pairs where the two strings in each pair must represent the same codeword. This means there are at most $2^n/2=2^{n-1}$ codewords possible. Since $2^{5-1}<26$, but $2^{6-1}>26$, a code of length $6$ is necessary and sufficient.

When we throw in reversibility, we divide the strings of length $2^n$ into three classes: palindromes (equal to their reverse), anti-palindromes (which are equal to the complement of their reverse), and boring strings which fall into neither class.

  • Palindromes: a palindrome is specified by the what the first $\lceil n/2\rceil$ symbols of a word, so there are $2^{\lceil n/2\rceil}$ palindromes. However, these can only represent $2^{\lceil n/2\rceil}/2$ codewords, since complements must represent the same word.

  • Anti-palindromes: When $n$ is odd, there aren't any anti-palindromes (the middle symbol of a string will always differ from that of its complement reverse). When $n$ is even, there are $2^{n/2}$ anti-palindromes, making $2^{n/2}/2$ codewords.

  • Boring words: Associated with every boring word are $3$ other codewords with which it can't be distinguished (reverse, complement, complement-reverse). To find the number of codewords these can make, you count the number of boring words, ($2^n-$ # palindromes $-$ # anti-palindomres), then divide by $4$.

When $n=6$, the number of codewords possible is $$ \frac{2^3}2+\frac{2^3}2+\frac{2^6-2^3-2^3}4=20<26 $$ so length $6$ is not enough. When $n=7$, $$ \frac{2^4}2+\frac{2^7-2^4}4=36>26 $$ so length $7$ is enough.

$\endgroup$
2
  • $\begingroup$ These answers match mine for the first two questions. $\endgroup$ Commented Apr 29, 2015 at 19:52
  • $\begingroup$ It's been quite some time with no activity (including my own) so I'll go ahead and accept this as the correct answer. $\endgroup$ Commented Jun 5, 2015 at 20:55
2
$\begingroup$

Your first two questions are answered in real life by bar-coding algorithms. These are codes that are (usually) specifically designed so that they are still uniquely readable when the symbols are inverted or backwards. In particular, Code 39 uses 12 bars for each alphanumeric character.

Your second two questions are answered by QR codes, much to the same effect. However, in this case a QR code uses three fixed-size reference anchors that make it really easy to detect (by design, I might add, since QR codes are meant to be interpretable from a 2D raster image). I'm sure it's not difficult to design an algorithm that doesn't have those visibility anchors that has the same resistance to reflection and rotation, though.

$\endgroup$
2
  • $\begingroup$ It's an interesting connection, but neither fully answers the question. $\endgroup$ Commented Apr 29, 2015 at 19:01
  • 1
    $\begingroup$ True. I just thought it would be nice to give some context as to how these problems are actually applicable in certain fields. $\endgroup$
    – user88
    Commented Apr 29, 2015 at 19:04
1
$\begingroup$

Here are examples of the shortest solutions arranged lexicographically.

Resistant to inversion:

A AAAAAA, B AAAAAB, C AAAABA, D AAAABB, E AAABAA, F AAABAB, G AAABBA, H AAABBB, I AABAAA, J AABAAB, K AABABA, L AABABB, M AABBAA, N AABBAB, O AABBBA, P AABBBB, Q ABAAAA, R ABAAAB, S ABAABA, T ABAABB, U ABABAA, V ABABAB, W ABABBA, X ABABBB, Y ABBAAA, Z ABBAAB

Resistant to inversion and reversal:

A AAAAAAA, B AAAAAAB, C AAAAABA, D AAAAABB, E AAAABAA, F AAAABAB, G AAAABBA, H AAAABBB, I AAABAAA, J AAABAAB, K AAABABA, L AAABABB, M AAABBAA, N AAABBAB, O AAABBBA, P AABAAAB, Q AABAABA, R AABAABB, S AABABAA, T AABABAB, U AABABBA, V AABBAAB, W AABBABA, X AABBBAA, Y AABBBAB, Z AABBBBA

Resistant to transposition and inversion:

A AAAAAB, B AAAABA, C AAAABB, D AAABAA, E AAABAB, F AAABBA, G AAABBB, H AABAAA, I AABAAB, J AABABA, K AABABB, L AABBAA, M AABBAB, N AABBBA, O AABBBB, P ABAAAA, Q ABAAAB, R ABAABA, S ABAABB, T ABABAA, U ABABAB, V ABABBA, W ABABBB, X ABBAAA, Y ABBAAB, Z ABBABA

This is the shortest code possible as Mike Earnest already proved 6 is the shortest length of inversion resistant code. It is trivially resistant to transposition as all codewords start with the same letter, but AAAAAA and BBBBBB aren't valid codewords.

Resistant to transposition, inversion and reversal:

A AAAAAAB, B AAAAABA, C AAAAABB, D AAAABAA, E AAAABAB, F AAAABBA, G AAAABBB, H AAABAAA, I AAABAAB, J AAABABA, K AAABABB, L AAABBAA, M AAABBAB, N AAABBBA, O AABAAAB, P AABAABA, Q AABAABB, R AABABAA, S AABABAB, T AABABBA, U AABBAAB, V AABBABA, W AABBBAA, X AABBBAB, Y AABBBBA, Z ABAAAAB

Similar to the previous section, Mike Earnest already proved 7 is the shortest length of inversion resistant, reversal resistant code. It is also trivially transposition resistant as each word starts with an 'A'.

No valid codewords appearing in the transposed direction

This is indeed impossible. We know that we need at least two codewords (in fact we need 26 codewords), and that all codewords must differ in at least one position. Consequently we can use the two codewords to get 'A's and 'B's in that position to spell out anything we like in the transposed direction.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.