20
$\begingroup$

Here is a question I posted on rec.puzzles back in 1998. So it's about twenty years on, and people seem to have stopped thinking about it. I'll refresh your memory by posting it again:

I wrote a routine to 're-arrange' data for comms purposes (not for compression, and not for encryption, but I'm not saying why just now). It struck me that it becomes completely unreadable, and I wondered how difficult it would be to decipher it. Any code-breakers out there?

Here is a brief description of how the 'encoding' works, but to read it you'll have to break the 'code'. Note that my routine produces 8-bit bytes, avoiding several bytes such as 'R', nul, xon, xoff etc for certain reasons (free hint). What follows is a hex dump of the encoded data stream. Another free hint: The 'encoded' data file is slightly larger than the original.

I call the coding scheme ESFE code (3rd free hint), and I have no idea whether it is standard practice - I have never seen anything like it in text books.

If someone gave me this to decode, I wouldn't get it in a bajillion years - but that would be the case with just about any code you gave me.

49 1D 26 70  FF 29 CD 5A  89 C6 18 16  0E 85 B5 8B
FC 71 F2 3E  5F FA 03 CC  3E 3B E6 F5  ED 38 6B C0
31 03 D4 41  16 2E B2 FE  67 17 1D 63  22 25 F0 14
14 47 76 8F  4D DB 0F 63  20 FC 35 E5  EB A8 39 21
34 C4 48 8B  DF B6 E6 68  E6 EC EA 42  F3 0E 25 65
9C FA 2D B1  A1 B2 7E 9E  30 54 B6 41  5E 5F 12 ED
7F 88 73 6E  BF DA 4F B4  33 44 C1 1A  EC F8 CD 1C
63 FC 27 FE  FB AA C0 5A  77 30 26 79  C1 B6 92 C7
2C 6C B0 6E  A4 72 4B A6  66 BE 3B 06  9D 3D 59 7A
F4 A7 CA F2  45 D8 3D 36  2F C1 70 ED  02 F5 A9 9D
30 19 3E E2  4C D8 36 69  7C 25 2B 87  27 FF F6 2C
89 5D 43 64  83 81 2A 9E  2F 30 CF 43  01 B8 27 D5
AE 46 48 4D  62 61 53 B4  03 5B 75 60  28 37 F5 B4
9D 8B 69 D0  6D 24 0F 96  A4 12 34 F7  D0 AE A3 5D
74 B4 30 FE  66 9F 39 7C  41 C1 12 E5  5E 73 D0 4C
49 7C 6B ED  34 7E 6D 64  60 72 89 02  83 A7 6B 08
CF 3A 77 4F  C2 A6 AC 8F  4D 88 1C E1  20 22 C3 F7
80 C1 37 64  21 E7 E7 31  2C 3F 27 FE  96 6C CE EA
E3 F3 FE 69  37 73 0E 3C  23 37 C8 89  6B 5A 9C 6E
64 CA 62 06  49 9C 54 09  59 DC 58 3C  C3 A6 75 07
B6 74 5C 77  AB 0A 63 9A  87 23 6D 8F  4D 81 5E 68
AF 21 8F 8C  E4 B0 39 E0  14 73 56 03  2C 73 16 7B
FC 95 23 24  FD CA 9C DE  66 6A 62 06  60 45 DD 07
71 C1 45 6F  29 88 06 54  EE FA BF 78  73 85 A9 9D
14 FC 1B B5  AF 70 73 67  2D E2 DE F6  45 0E F5 F9
E6 51 15 2B  62 B3 1B D5  01 20 C2 36  83 72 8A 5E
33 F2 A7 3A  E4 96 6B C0  5E A6 CA 4F  83 D7 72 61
DE 4E BF FE  09 7C 21 F6  2D 2A DA 8E  43 75 98 56
AD 5F 2C 2B  3F 2E 39 62  CF 23 3E AF  3C A4 34 CF
2A 1D 22 6D  64 A1 66 04  46 C2 98 05  FC 33 07 F9
30 8A 7E 9C  E7 D6 54 62  06 C2 1C C6  72 D9 73 96
A4 D1 9B FE  43 4B BD 5A  CD CD 86 FB  D6 80 CE 0B
5D 81 56 87  4D 73 15 C2  64 7C 92 38  03 46 BE E8
33 92 E9 69  09 DB C8 C0  21 20 05 9E  5C D7 F5 78
C7 19 1A 36  72 94 C2 CB  32 EB A9 FB  2F 26 6C 5D
73 63 DD A7  43 2E 39 41  07 A3 4A 3D  D9 7C 92 0A
A4 7A C8 C5  A4 BA C9 7A  9C FD 37 78  A4 57 B7 26
73 26 29 DB  B1 CC AD 92  EE C0 1C BD  1C DE EB 03
04 F5 09 F7  D0 AE A3 5D  74 6C CA A1  96 23 CD 5C
A2 A4 2D 85  EF 9D B4 DC  BB AC D8 23  9F FA A5 0A
3C 38 5F 06  65 69 6C 08  5A AD 9B 81  79 F5 92 63
C3 41 67 FD  09 73 37 4A  0E 22 35 F7  CC B7 97 24
2C FC FB 94  3F 5C B4 4E  67 15 18 5D  DC A4 81 C6
19 F2 03 75  74 F5 40 6F  64 4F 99 BF  74 3C 59 AB
31 39 F4 6E  B0 5F F6 81  83 16 15 01  21 94 92 2F
80 BB 63 92  4D DB 0F 63  2D FA 39 05  32 53 CB 5A
26 73 56 69  01 85 81 FC  A8 86 B0 74  47 48 BD BE
BB C4 3A CD  A4 06 59 BA  01 5C 75 4F  E6 85 E8 60
C3 8B 9C 32  22 84 C1 E4  A4 17 4F 3A  4A 54 CC 1C
2D FC 3C F6  CB 03 C6 33  F5 FA 2D BE  0A 85 81 75
43 8C 4D 0F  57 0E A3 6F  33 14 6F 02  63 5F 53 C3
78 5C 3A 2E  40 78 92 99  D0 8B D0 31  22 60 51 2D
80 6E F1 7F  2F F2 1B 49  5F 40 06 D0  31 2E 39 9C
0F 6E DB 09  F4 48 56 B0  51 35 CB B7  42 6E 1C 78
0F DF 6F CB  DD 5C 02 F9  CB 73 B2 30  AA 7B 7F CB
D0 B1 F5 94  09 3F 53 CF  05 B9 67 C1  F8 DB B1 5C
65 63 3B 87  99 59 CB F4  E4 51 35 40  FA B9 E2 29
F5 AD B4 A4  26 BA 81 29  C7 79 20 06  E0 84 AA CF
61 0B 7D 40  F9 88 12 BD  2E C1 0C B8  BF 21 C9 47
0E D3 C8 E2  40 54 36 67  A9 5F C8 08  47 ED AE DE
09 D0 23 45  DB 71 80 12  5B 23 40 1C  8F A2 27 65
51 98 A4 3A  FA 61 53 B4  03 AC CA 9B  84 35 9E C4
9C 15 1B F3  C5 82 19 FF  9E 87 0F B5  B1 70 73 67
65 3F C9 99  BD 88 39 84  C8 2F 3E E5  5E 73 2A 84
AD 31 17 D9  80 FA C0 16  14 C2 20 BF  A0 99 64 B8
4C 40 0B B8  67 0E 3E 60  FB 19 D1 F4  F8 D8 C4 AE
2E 51 0B FE  2F 6B 40 67  7F 1C 8F FB  6A 69 CB 23
7C BF 55 C0  58 FE F9 D5  01 20 CE BF  79 96 9A 65
C6 B5 56 65  FA 73 AA AA  31 DB 0C E1  55 B2 5A 09
B6 B4 CD 04  21 27 4B 3B  76 B8 3A 8F  49 33 C9 1C
79 7A 79 8C  AE 6C AE F2  E1 AE 40 5B  B2 23 24 5A
94 AB FE FA  58 F5 0A 46  09 77 12 22  92 A9 E2 40
61 0B 20 9A  06 0F B0 F3  67 51 19 12  6E 28 69 47
A3 05 FB E7  4A 42 CF 5C  7F 5C 2C 2B  FB AA B0 25
19 D0 F5 8B  70 73 F8 0F  A4 4A A7 B7  82 FA 0C 71
9C 3A 5F 06  A5 7B 07 40  49 FA 8F 4E  25 7D 4C EF
A3 74 02

I'll add some 'progress so far', ie what was posted in September 2000 by Jim Gillogly.

Doing an index of coincidence at various periods shows that it has some kind of periodic behavior at offsets of 7 and 8 bytes -- whatever breaking-up of characters is going on appears to come back into focus then. Here are the periods with highest IC's up to about period 120:

56 0.0226 112 0.0216 28 0.0134 84 0.0118 98 0.0098 14 0.0083 70 0.0079 42 0.0073 16 0.0070 120 0.0069 49 0.0069 8 0.0069 104 0.0068 80 0.0067 40 0.0066

All values with factors of 7 or 8 or both.

This effect is confirmed by a Kasiski analysis. Here are the longest repeated strings:

4 at 52, offset 672: 4ddb0f63 4 at 197, offset 784: 6153b403 6 at 219, offset 392: f7d0aea35d74 3 at 235, offset 784: e55e73 3 at 389, offset 616: 707367 3 at 407, offset 672: d50120

And the factorization of their offsets: 392: 2 2 2 7 7 672: 2 2 2 2 2 3 7 784: 2 2 2 2 7 7 616: 2 2 2 7 11 672: 2 2 2 2 2 3 7

All 7's and 8's. Perhaps the basic operation is sending ASCII with the high-bit trimmed off, but doing something additional when the "special" bytes are encountered on output. This might show the kind of periodicity I'm seeing.

It might be instructive to re-do the IC and Kasiski analysis on a bit level rather than a byte level. I suspect we'd see longer repeated sequences (by which I mean longer than 6*8 bits).

   Jim Gillogly
   Hevensday, 21 Halimath S.R. 2000, 16:35
   12.19.7.9.15, 4 Men 18 Mol, Sixth Lord of Night
$\endgroup$
5
  • 3
    $\begingroup$ +1 just for the fact that you've so patiently waited for a hero to come by and solve this :c) I hope my own "still open" questions won't last 20years before anyone cracks them.... 8-) Happy NewYear to you, anyway! $\endgroup$
    – BmyGuest
    Commented Dec 31, 2017 at 17:42
  • $\begingroup$ Okay, top of my head initial guesses (I haven't yet had a chance to test any of these) - ESFE possibly stands for Error Safe F... Encoding (not sure what the F would be, file? failsafe?). For comms purpose (not encryption or compression), and the output is slightly longer than the input? Quite possibly contains parity bits - 1dim or 2dim. "I have never seen anything like it in text books" Hmmm, perhaps not parity bits - or not just parity. "avoiding several bytes such as 'R', nul, xon, xoff etc for certain reasons" - need to consider which system would interpret these characters specifically. $\endgroup$
    – niemiro
    Commented Dec 31, 2017 at 19:29
  • $\begingroup$ Looks like play-fair matrix encryption. $\endgroup$ Commented Jan 1, 2018 at 14:20
  • $\begingroup$ Computer readable data: hex, bin $\endgroup$ Commented Jan 4, 2018 at 16:08
  • $\begingroup$ OK 'another hint. 'ESFE' stands for Eleven-Seven-Five-Eight. In fact I could have called it EESFE with another Eight on the front. $\endgroup$ Commented Oct 19, 2019 at 1:50

1 Answer 1

7
+100
$\begingroup$

Well, this is certainly most interesting, so let's get going with a Very Partial answer. I'm not going to bother with spoiler tags; if others have had 20 years to solve this, it's pretty much certainly going to take a group effort here.

A basic letter (or in this case, octet) frequency analysis reveals that the octets may as well have been chosen randomly; there are a couple of outliers though. Here's the analysis:

The most common occurrence count is 5, with 47 different octets showing up exactly that many times. There are 12 different octets that show up only once, and 10 octets that show up exactly 10 times. In general, the it looks like the farther away a number is from 5, the less likely it is that an octet occurs exactly that many times, with a sharp cutoff at 11 occurrences. However, the octet "73" occurs as many as 18 times.

enter image description here

The shape of the distribution resembles that of a binomial distribution. This is what you'd expect from randomly selected equiprobable octets, except that

  • the sharp cutoff probably shouldn't be there
  • some outliers are to be expected, but 18 occurrences is a lot.

Therefore, the octet value 73 may be magic somehow. Or maybe we are just unlucky, and it's a red herring.

Given that we know that plaintext is supposed to be a description of the method, hopefully in clear English, we can conclude that this is extremely unlikely to be a simple substitution cipher of any kind, and that the method OP used creates nearly perfectly random-looking output even when given highly patterned data (English) as input.

EDIT (2018-01-24):

Since nobody else has picked this up, here's some data for the next step, which is looking for repeating portions. I converted each hex character individually to its binary form (high bit first: "1"->0001, "A"->1010, etc.) and ran a search for repeated sequences and their starting positions in the binary string. Here are the unchecked raw results, after removing duplicates by hand:

found a repeating binary string of length 24:
110101101000000011001110
at positions:
4064
8489

found a repeating binary string of length 24:
000000100100111111010100
at positions:
7070
9659

found a repeating binary string of length 26:
10111001010101111001110011
at positions:
1878
8150

found a repeating binary string of length 26:
10111000001110011011001110
at positions:
3111
8039

found a repeating binary string of length 29:
11101010100000001001000001100
at positions:
3255
8631

found a repeating binary string of length 35:
01001100001010100111011010000000011
at positions:
1573
7845

checking for repeating bit patterns of length 36
found a repeating binary string of length 36:
010011011101101100001111011000110010
at positions:
416
5792

checking for repeating bit patterns of length 48
found a repeating binary string of length 48:
111101111101000010101110101000110101110101110100
at positions:
1752
4888

Analysing the intervals might give an insight into the "block size" of the encoding. For example, if it turns out that almost every interval is a multiple of some smallish number (say, between 6 and 12, since the hints say that the plain text is just a little shorter than the encoded text), it stands to reason that the repeated encoded strings are a result of repeated plaintext, encoded identically, and that the encoded characters are that many bits wide.

I'll try to find the time to continue this analysis soon. (If any of you want to pick up from this point, please drop a comment, so we can avoid duplicate work. Thanks!)

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.