12
\$\begingroup\$

The golfing language Jelly has a very complex and clever string compression system which I'm not going to go into depth about here. You can find a very good explanation here.

Basically, Jelly's string literals are represented as “...», where ... is a string of characters from Jelly's codepage that is decoded into a base-250 integer before being decoded into a string which can contain both words and printable ASCII characters.

Your challenge is to find a string that, when decompressed in Jelly, represents itself. For example, if xxx is your string, “xxx» must return xxx to be valid.

You can test ideas here, or just download Jelly and use the sss function.

Scoring

The shortest submission in bytes (not counting and ») wins. I'll happily accept a proof of impossibility, although it seems unlikely.

I've tested that there are no 1-or-2-byters, but there might still be a trivial one that is a single word.

Tips, notes and ideas

  • Compressing ASCII characters is actually less effective than just using an ordinary string, so with enough, you'll have space to fit an extra word.
  • You'll have to use printable ASCII (+ newlines) because nothing decompresses into characters that aren't those.
  • There might be trivial solutions of 1-2 short words.

The empty string is not a valid solution. The string must be one single string - that is, you can't just escape the string and output it some other way.

\$\endgroup\$
3
  • \$\begingroup\$ You should probably specify that the "string" can't contain extra or », so answers don't just escape the string and write a normal quine \$\endgroup\$
    – pxeger
    Commented Sep 12, 2021 at 6:14
  • \$\begingroup\$ I suspect there is no such quine. xxx would have to contain only ascii characters + pilcrow, as non-ascii can't be compressed. As all the words in Jelly's dictionary are longer than the 2-2.5 bytes required to compress them, we can't have the string ever look up a word, it can only decompress as characters. It might be possible to find a string that decompresses to itself that only uses the characters, but I doubt it (I'll take a proper go at proving it doesn't later) \$\endgroup\$ Commented Sep 12, 2021 at 7:42
  • 6
    \$\begingroup\$ I’m also skeptical, but a simple length argument doesn’t suffice; there are strings that decompress to shorter strings, using the 2-letter words in the short dictionary with flags. \$\endgroup\$ Commented Sep 12, 2021 at 8:35

2 Answers 2

5
\$\begingroup\$

closest so far: 'xC~x9u1' from “xC~x9u+» and 'zC~x9u1' from “xC~x9u1»: 6/7, 85.7%

floor: above 33*250^9
ceiling: below 129×250^34

update from last time: no matches in 8 or 9 digits; my computer is too slow to find any more near misses. 9 digits alone took around 634 CPU hours (~26 days, split onto 4 cores), checking the same way for 10 digits would take over 6 CPU years on my computer. I recently thought of a new strategy for skipping a large percent of numbers but it means I also skip near misses as well; so any further digits I do will have yet another new definition for near miss. I also haven't figured out if it's guaranteed that it'll hit every match or if it has a chance to skip some, so I'm gonna do some more math before I figure out how to explain it here. if you know how to improve code performance, I'm still putting new iterations of my code as comments at the bottom of the gist linked in the comments, and I would love any suggestions you have. it's still in rust right now; but I have no idea what I'm doing when it comes to optimizations. there were quite a few near misses in the 7 digit range so I am rather disappointed that there were none in 8 or 9; if you spot a bug in my algorithm that'd be cool as well.

no other changes since last time:

no solution yet, but some preliminary results which may be helpful to others.

in the first step of decompression, you divmod by 3 to either select making a character or using one of the dictionaries. if you ignore the dictionary, this means it basically reduces to a base conversion problem where one side is base 1/288: (using the for when it would touch a dictionary and ¶ for a newline)

 ��!��"��#��$��%��&��'��(��)��*��+��,��-��.��/��0��1��2��3��4��5��6��7��8��9��:��;��<��=��>��?��@��A��B��C��D��E��F��G��H��I��J��K��L��M��N��O��P��Q��R��S��T��U��V��W��X��Y��Z��[��\��]��^��_��`��a��b��c��d��e��f��g��h��i��j��k��l��m��n��o��p��q��r��s��t��u��v��w��x��y��z��{��|��}��~��¶��

and the other side is bijective base 250:

��¢£¤¥¦©¬®µ½¿€ÆÇÐÑ×ØŒÞßæçðıȷñ÷øœþ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¶°¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ƁƇƊƑƓƘⱮƝƤƬƲȤɓƈɗƒɠɦƙɱɲƥʠɼʂƭʋȥẠḄḌẸḤỊḲḶṂṆỌṚṢṬỤṾẈỴẒȦḂĊḊĖḞĠḢİĿṀṄȮṖṘṠṪẆẊẎŻạḅḍẹḥịḳḷṃṇọṛṣṭ§Äẉỵẓȧḃċḋėḟġḣŀṁṅȯṗṙṡṫẇẋẏż

since the symbols have different values from each other I think it's just a matter of coincidence to get the same number to be represented with the same symbols.

what this means is that until one of them out-bases the other, there may be a number that matches somewhere. (after 250^40 [roughly 8.27e95 or 2^318.7], base250 will always take more digits and you'd have to use the dictionaries to pad the output; but past ~129x250^34 [roughly 4.37e83 or 2^277.8] every base250 number the same length as the corresponding base288 number will start with a digit greater than so it will never be ascii-only anymore). what I'm calling a 'near miss' in this first section is when an n-1 long string of digits are the same in both bases. this was found with slow python code; but the 'near miss' qualification was different so I'll leave the results in the answer.

near misses at one digit: [“/» = '0'(2 off) and “2» = '1'(1 off)]
near misses at two digits: [“-Ñ» = '¶-'(111 off) jumps to “/_» = ' .'(303 off)]

'"*'   “S*»    8,793
'),'   “[,»   10,545
'0.'   “c.»   12,297
'70'   “k0»   14,049
'>2'   “s2»   15,801
'E4'   “{4»   17,553
'c='   “"=»   25,062
'j?'   “*?»   26,814
'qA'   “2A»   28,566
'xC'   “:C»   30,318
'¶E'   “BE»   32,070
also spotted “y©» = 'yC' which I thought was funny since the c is just circled

near misses at three digits: [“+B>» = '¶B+'(65 off) jumps to “+D⁽» = ' C+'(349 off)]

'/D,'   “eD,»   3,017,295
';J/'   “*J/»   3,768,798
';Jb'   “;J/»   3,768,849
'G2 '   “EG2»   4,512,783
'G2#'   “FG2»   4,512,786
'G2&'   “GG2»   4,512,789
'G2)'   “HG2»   4,512,792
'G2,'   “IG2»   4,512,795
'G2/'   “JG2»   4,512,798
'G22'   “KG2»   4,512,801
'G25'   “LG2»   4,512,804
'G28'   “MG2»   4,512,807
'G2;'   “NG2»   4,512,810
'G2>'   “OG2»   4,512,813
'G2A'   “PG2»   4,512,816
'G2D'   “QG2»   4,512,819
'G2G'   “RG2»   4,512,822
'G2J'   “SG2»   4,512,825
'G2M'   “TG2»   4,512,828
'G2P'   “UG2»   4,512,831
'G2S'   “VG2»   4,512,834 [ it's so close to ]
'G2V'   “WG2»   4,512,837 [ being an anagram ]
'G2Y'   “XG2»   4,512,840 [ ...just needs to ]
'G2\'   “YG2»   4,512,843 [ shift a tiny bit ]
'G2_'   “ZG2»   4,512,846
'G2b'   “[G2»   4,512,849
'G2e'   “\G2»   4,512,852
'G2h'   “]G2»   4,512,855
'G2k'   “^G2»   4,512,858
'G2n'   “_G2»   4,512,861
'G2q'   “`G2»   4,512,864
'G2t'   “aG2»   4,512,867
'G2w'   “bG2»   4,512,870
'G2z'   “cG2»   4,512,873
'G2}'   “dG2»   4,512,876
'WW6'   “mW6»   5,522,055
'[Y<'   “[Y7»   5,772,561
'c]9'   “2]9»   6,273,558
'¶j@'   “uj@»   8,026,815
'¶j^'   “¶j@»   8,026,845

near misses at four digits: [“)f⁸ȧ» = '¶¶g)' jumps to “)i4Ḣ» = ' h)']

'-I;*'   “-I;V»     723,390,087
'`G]5'   “`G]K»   1,520,148,576

near misses at five digits: [didn't calculate crossing point]

'^)L75'   “n^)L7»   435,080,769,306
'f>-A8'   “~f>-A»   497,707,074,066

this section is now from much faster rust code; but as is the nature of exponentials it only gets a bit farther. every ascii-only, non-dictionary, number 8 digits or under has now been checked. after some more optimizations, the 8-digit check only took around 4 hours. there's a couple more optimizations I'm going to implement, but then I'm going to start it on 9 digits, which is projected to finish in only 16 days.
if you want to steal the code and run it on your computer as well you can see the gist linked in the comments from @pan. as for how it works: most numbers are either not ascii-jelly or not dictionary-free, meaning once you've built one of the strings and you find out there's a non-ascii somewhere in the other one all the work you spent making those strings is wasted. we start with the jelly side; writing a for loop that runs through the combinatorial product of n ascii digits. we then put it through an if that checks it's a non-dictionary decompressed string. since we now know that all numbers used are valid, we construct both vectors, and it prints it if there's n-1 digits with the same chars in the same indices. (the more optimized version that finished 8 digits in 4 hours is basically the same except it also realizes that most invalid numbers tend to follow each other, so it jumps to the next valid number instead of trying every single invalid number)

'"*'        “S*»                        8,793
'),'        “[,»                       10,545
'0.'        “c.»                       12,297
'70'        “k0»                       14,049
'>2'        “s2»                       15,801
'E4'        “{4»                       17,553
'c='        “"=»                       25,062
'j?'        “*?»                       26,814
'qA'        “2A»                       28,566
'xC'        “:C»                       30,318
'¶E'        “BE»                       32,070
2 digits
'/D,'       “eD,»                   3,017,295
'/_,'       “/L,»                   3,024,045
';J/'       “*J/»                   3,768,798
';Jb'       “;J/»                   3,768,849
'WB6'       “WQ6»                   5,516,805
'WW6'       “mW6»                   5,522,055
'[Y<'       “[Y7»                   5,772,561
'c]9'       “2]9»                   6,273,558
'¶%@'       “¶V@»                   8,009,565
'¶j@'       “uj@»                   8,026,815
'¶j^'       “¶j@»                   8,026,845
3 digits
'-I;V'      “-I;*»                723,390,087
'7iw,'      “7Rw,»                881,655,045
'`G]K'      “`G]5»              1,520,148,576
4 digits               
')L]<('     “)L]"(»           165,271,515,291
5 digits              
6 digits              
'"^zHf-%'   “"*zHf-%»   8,638,176,928,324,038
')`K0/-&'   “)`Kh/-&»  10,348,931,331,136,539
'jURA9~/'   “jtRA9~/»  26,237,629,941,156,798
'xC~x9u1'   “xC~x9u+»  29,607,919,863,029,544
'zC~x9u1'   “xC~x9u1»  29,607,919,863,029,550
7 digits
8 digits
9 digits
\$\endgroup\$
2
  • \$\begingroup\$ (rust being workshopped at this stack overflow question) \$\endgroup\$
    – guest4308
    Commented Dec 13, 2023 at 3:56
  • 1
    \$\begingroup\$ Hey, I modified your code a bit in order to parallelize it, hope it helps. \$\endgroup\$
    – pan
    Commented Dec 19, 2023 at 7:51
3
\$\begingroup\$

No solution without the dictionary using at most 9 characters

Using meet-in-the-middle, we can check that there are no solutions without the dictionary using under 9 characters with the following code:

import itertools
import tqdm


code_page  = '''¡¢£¤¥¦©¬®µ½¿€ÆÇÐÑ×ØŒÞßæçðıȷñ÷øœþ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¶'''
code_page += '''°¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ƁƇƊƑƓƘⱮƝƤƬƲȤɓƈɗƒɠɦƙɱɲƥʠɼʂƭʋȥẠḄḌẸḤỊḲḶṂṆỌṚṢṬỤṾẈỴẒȦḂĊḊĖḞĠḢİĿṀṄȮṖṘṠṪẆẊẎŻạḅḍẹḥịḳḷṃṇọṛṣṭ§Äẉỵẓȧḃċḋėḟġḣŀṁṅȯṗṙṡṫẇẋẏż«»‘’“”'''

pos = code_page[32:96+32]

rev = {a: 3*(code_page.index(a) - 32) for a in pos}
ver = {a: code_page.find(a) + 1 for a in pos}


n = 9
poss = []
for i in range(n):
    spos = []
    for c in pos:
        spos.append(ver[c] * 250**(n-1-i) - rev[c] * 228**i)
    poss.append(spos)

print(poss)

v1 = {sum(x) for x in tqdm.tqdm(itertools.product(*poss[:n//2]), total=96**(n//2))}

for x in tqdm.tqdm(itertools.product(*poss[n//2:]), total=96**((n+1)//2)):
    if -sum(x) in v1:
        print(x, sum(x))
        exit(0)

Projecting from the runtime for 9 characters, running it for 10 characters would either take a lot of memory (128GB might be enough) and a couple of hours, or around 1GB and a few days.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.