17
\$\begingroup\$

This is the reverse of this challenge.

Given an encoded list of codepoints and the characters used to encode it, you need to decompress it to its original string.

For example, given the encoded list [170, 76, 19, 195, 32] and the encoder characters " abcdefghijklmnopqrstuvwxyz", the output should be "the fox".

How, though?

First, we need to go through the encoder characters and find out what bits they correspond to. We do this by getting the binary of each corresponding (1-indexed) index in the encoder string. For example, the encoder string 6923:

6, index 1 => 1
9, index 2 => 10
2, index 3 => 11
3, index 4 => 100

Now we need to pad each with leading zeros until they are all of equal length:

6, index 1 => 001
9, index 2 => 010
2, index 3 => 011
3, index 4 => 100

Now that we have a lookup table, let's decode it. First, let's get the binary of each codepoint. Let's say we have codepoints [137, 147, 16] (and still encoder string 6923):

137 => 10001001
147 => 10010011
16  => 10000

The last one may not be 8 bits long. In this case, pad it with leading zeros until it is:

137 => 10001001
147 => 10010011
16  => 00010000

Now, smash all these bits into one string: 100010011001001100010000. Now, we need to group it into sizes of how long each byte will be in the lookup table. Looking back at our lookup table with 6923 encoder string, we can see that each byte is 3 bits long. This means that we need to split the bits into chunks of 3:

100 010 011 001 001 100 010 000

Now, for each, check its character in the lookup table. If it is all zeros (and therefore not in the lookup table), ignore it.

100 => 3
010 => 9
011 => 2
001 => 6
001 => 6
100 => 3
010 => 9
000 is ignored

Now, joining this together, the string was 3926639, and we're done!

Test cases

Codes: [170, 76, 19, 195, 32], encoder characters: " abcdefghijklmnopqrstuvwxyz" => "the fox"
Codes: [151, 20, 40, 86, 48], encoder characters: "123456789" => "971428563"
Codes: [170, 76, 25, 89, 68, 96, 71, 56, 97, 225, 60, 50, 21, 217, 209, 160, 97, 115, 76, 53, 73, 130, 209, 111, 65, 44, 16], encoder characters: " abcdefghijklmnopqrstuvwxyz" => "the quick brown fox jumps over the lazy dog"
Codes: [108], encoder characters: "abc" => "abc"
Codes: [255], encoder characters: "a" => "aaaaaaaa"
Codes: [85, 170], encoder characters: "ab" => "aaaabbbb"

Rules

  • Second input can be a string, list, or even list of codepoints (same with output). It doesn't matter, I/O is very flexible for this challenge.
  • First input can, again, be a list of the codepoints, or a string containing the represented codepoints (if your language uses an SBCS, you're free to use it (but you can still use UTF-8)). Again, I/O is flexible.
  • Input will always be valid.
  • Encoder characters will always contain less than 256 characters.
  • Neither input will ever be empty.
  • This is , so the shortest answer in bytes for each language wins.
  • Standard I/O rules apply.
  • Default loopholes are forbidden.

Reference implementation in JavaScript

function decode(points, encoderChars) {
  const maxBitCount = Math.ceil(Math.log2(encoderChars.length + 1));
  const charToBit = Object.fromEntries(encoderChars.map((c, i) => [c, (i + 1).toString(2).padStart(maxBitCount, "0")]));
  const charBits =
    points
      .map((x) => x.toString(2).padStart(8, "0"))
      .join("")
      .match(RegExp(`.{1,${maxBitCount}}`, "g")) || [];
  return charBits
    .map((x) => Object.entries(charToBit).find((y) => y[1] === x)?.[0] || "")
    .join("");
}

Attempt This Online! (feel free to steal the test case code for your own JS answer)

\$\endgroup\$

14 Answers 14

7
\$\begingroup\$

Vyxal, 16 bytes

b8∆Zf⁰LbLẇvBꜝ‹İ∑

Try it Online!

b                # Convert each from binary
 8∆Z             # Pad each to length 8
    f            # Flatten
         ẇ       # Split into chunks of length... 
     ⁰LbL        # Length of the encoder string, in binary
          vBꜝ‹   # Convert each from binary, remove 0s and decrement
              İ∑ # Index those into the encoder and sum
\$\endgroup\$
5
  • \$\begingroup\$ Your result sometimes contains a trailing space? Your current test case contains a trailing space for example. And "6932", [137,147,16] contains a trailing space as well. (But not all of them do, because "a", [255] lacks that trailing space.) \$\endgroup\$ Commented May 23, 2022 at 10:01
  • 1
    \$\begingroup\$ @KevinCruijssen Oh oops... that's happening because I forgot to deal with zeroes. \$\endgroup\$
    – emanresu A
    Commented May 23, 2022 at 10:03
  • 1
    \$\begingroup\$ Hmm.. the longest test case seems to fail as well for some reason. \$\endgroup\$ Commented May 23, 2022 at 10:49
  • 1
    \$\begingroup\$ The reason this does not work for test case 3 is that the encoded data does not end with exactly one "byte" of zeros. This, I believe, is a result of the linked (reverse) challenge having the specification "If the last byte is not 8 bits long, add zeros at the end till it is 8 bits long". Thus converting from base 256 loses the information of how many of the implicit zeros are leading vs trailing. \$\endgroup\$ Commented May 23, 2022 at 17:12
  • \$\begingroup\$ Fixed, I think. \$\endgroup\$
    – emanresu A
    Commented May 23, 2022 at 19:56
5
\$\begingroup\$

Uiua, 29 25 bytes (SBCS*)

or 55 bytes using native UTF-8 encoding

⇌⊏-1▽±.⍘⋯↯⊂¯1⧻⋯⧻,⇌↘1⋯⊂255

Try it!

                     ⊂255  # prepend 255 to list of codepoints (to force 8 bits)
                    ⋯      # convert each to bits
                  ↘1       # drop the first one (the 255)
                 ⇌         # and reverse the rows (so concatenation of uiua's big-endian bits will work);
                ,          # now duplicate the second stack item (the encoding characters)
               ⧻           # get its length
             ⧻⋯            # convert to bits and get the number of bits
         ↯⊂¯1              # reshape the codepoint-bit-array to rows of this length
       ⍘⋯                  # and convert each row back to a number from bits (=invert bits);
     ±.                    # get the signs of a copy of this (1 or 0)
    ▽                      # and use it to remove any zeros
 ⊏                         # and use the resulting numbers to index into the encoding characters
  -1                       # (after subtracting 1 to convert to zero-based indexing)
⇌                          # finally, re-reverse (to undo the earlier reversal)

*Uiua natively supports UTF-8 encoding, allowing functions to bound to multi-byte characters such as emojis. However, this isn't essential for programming in Uiua, and its full functionality can be achieved using the union of printable ASCII characters with Uiua's 108 glyphs, which can each be represented using a single byte.
This answer to the challenge provides one way that such a mapping can be implemented, which can be considered as a single-byte character system (SBCS) for Uiua (with the obvious proviso that multi-byte characters other than Uiua's glyphs are not used in the code). Since Uiua's default behaviour is to run any file in the directory with a .ua filename suffix, this can even be used as a runnable implementation by simply prepending &fwa "program.ua" to save the output to the file "program.ua".

\$\endgroup\$
3
  • 1
    \$\begingroup\$ Nice, thanks for making this! \$\endgroup\$
    – chunes
    Commented Oct 11, 2023 at 14:55
  • \$\begingroup\$ Fitting for a challenge about encoding bytes. \$\endgroup\$ Commented Oct 12, 2023 at 12:01
  • \$\begingroup\$ in 2024 the negative 1 in reshape should be infinity, for 24 bytes, and the glyph for inverse changed \$\endgroup\$ Commented Jun 28 at 17:35
2
\$\begingroup\$

J, 35 31 bytes

((]i.0-.~($@]$!.0,@[)&.#:)#\){]

Try it online!

\$\endgroup\$
2
\$\begingroup\$

PARI/GP, 70 bytes

f(p,e)=[e[i]|i<-digits(fromdigits(p,256)>>(8*#p%l=#binary(#e)),2^l),i]

Attempt This Online!

Inputs and outputs lists of characters.

\$\endgroup\$
2
\$\begingroup\$

05AB1E, 17 16 bytes

₁+b€¦JIgbgôC0K<è

Port of @emanresuA's Vyxal answer, so make sure to upvote him/her as well!
-1 byte thanks to @CommandMaster

Outputs as a list of characters.

Try it online or verify all test cases.

Explanation:

₁+               # Add 256 to each integer in the first (implicit) input-list
  b              # Convert them to binary-strings
   €¦            # Remove the leading "1" from each
     J           # Join the list of strings together
      I          # Push the second input-string
       g         # Pop and push its length
        b        # Convert it to binary
         g       # Pop and push its length
          ô      # Split the earlier string into parts of that size
           C     # Convert each inner string from binary to a base-10 integer
            0K   # Remove all 0s
              <  # Decrease each by 1 the 1-based indices 0-based
               è # Use it to index into the second (implicit) input-string
                 # (after which the resulting list is output implicitly)
\$\endgroup\$
1
  • 1
    \$\begingroup\$ -1 by replacing b8jJð0: with ₁+b€¦J \$\endgroup\$ Commented May 27, 2022 at 11:07
2
\$\begingroup\$

MathGolf, 17 16 bytes

@♠+àm╞y\£à£/åç(§

-1 byte implicitly thanks to @CommandMaster, applying the same change as in my 05AB1E answer

I/O as a list of characters.

Try it online.
Minor note: the spaces in the input-lists are replaced with _, because there is a bug in MathGolf when reading a list that contains spaces (it reads them as ", " instead of " ")..

Explanation:

@          # Reverse triple swap, so the stack now contains two encoder lists and
           # the integer-list
 ♠+        # Add 256 to each integer
   à       # Convert all integers in the list to binary-strings
    m╞     # Remove the leading "1" from each
      y    # Join it to a single string
 \         # Swap so an encoder-list is at the top
  £        # Pop and push its length
   à       # Convert it to a binary string
    £      # Pop and push its length again
     /     # Split the earlier string into parts of that size
      å    # Convert each from a binary-string back to an integer
       ç   # Remove all 0s with a falsey filter
        (  # Decrease each to make the 1-based indices 0-based
         § # Index it into the remaining encode-list
           # (after which the entire stack is output implicitly as result)
\$\endgroup\$
2
\$\begingroup\$

Charcoal, 26 25 bytes

⁻⭆⪪⭆θ◧⍘ι !⁸L↨L粧⁺ψη⍘ι !ψ

Attempt This Online! Link is to verbose version of code. Explanation:

    θ                       Input array
   ⭆                        Map over values and join
       ι                    Current value
      ⍘                     Custom base conversion using
         !                  Literal string ` !`
     ◧                      Right-padded to length
          ⁸                 Literal integer `8`
  ⪪                         Split into substrings of length
              η             Encoder characters
             L              Length
            ↨               Converted to base
               ²            Literal integer `2`
           L                Length
 ⭆                          Map over values and join
                  ψ         Predefined variable null
                 ⁺          Concatenated with
                   η        Encoder characters
                §           Indexed by
                     ι      Current value
                    ⍘       Custom base conversion using
                       !    Literal string ` !`
⁻                           Remove occurrences of
                        ψ   Predefined variable null character
                            Implicitly print

This approach turned out to be shorter than a numerical approach based on base-256 conversion and custom string base conversion.

23 bytes using the older version of Charcoal on TIO:

⭆⪪⭆θ◧⍘ι !⁸L↨L粧⁺ψη⍘ι !

Try it online! Link is to verbose version of code. Explanation: The version of Charcoal on TIO special-cases AtIndex to return the empty string instead of the null character, even when indexing into an array. Most of this time that behaviour is incredibly annoying but here it's actually helpful. Unfortunately that version of Charcoal does suffer from a bug in BaseString so it doesn't support zero values in the input, which can happen on rare occasion.

\$\endgroup\$
1
\$\begingroup\$

Python, 139 bytes

lambda L,E:(n:=len(bin(len(E)))-2)and"".join(["",*E][int("".join(bin(i)[2:].rjust(8,"0")for i in L)[i:i+n],2)]for i in range(0,len(L)*8,n))

Attempt This Online!

I suspect it could be shorter, but I'm not well versed in binary manipulation

\$\endgroup\$
1
\$\begingroup\$

Python 3, 112 bytes

lambda s,e,j=''.join:j(map(lambda*a:['',*e][int(j(a),2)],*[iter(j(map('{:08b}'.format,s)))]*len(f'{len(e):b}')))

Try it online!

Basically the inverse of my answer to the other question.

\$\endgroup\$
1
\$\begingroup\$

JavaScript (ES7),  99  95 bytes

Expects (encoder_characters)(stream).

(d,v,i=n=0)=>g=a=>d[k=2**n-1]?g(a,n++):(i>7?0:v=v<<8|a.shift(i+=8),j=v>>(i-=n)&k)?d[j-1]+g(a):a

Try it online!

\$\endgroup\$
1
\$\begingroup\$

Jelly, 17 bytes

BUz0ZUFsLBLƊ}Ḅt0ị

A dyadic Link that accepts the code on the left and the alphabet on the right and yields the plain text.

Try it online!

How?

BUz0ZUFsLBLƊ}Ḅt0ị - Link: code (list of integers in [1,255]), alphabet (list)
B                 - convert code numbers to binary lists
 U                - reverse each
  z0              - transpose with filler zero
    Z             - transpose
     U            - reverse each
      F           - flatten
           Ɗ}     - last three links as a monad - f(alphabet):
        L         -   length of alphabet
         B        -   to binary list
          L       -   length
       s          - split flattened bits into slices of that length
             Ḅ    - convert these from binary
              t0  - trim zeros
                ị - index into the alphabet
\$\endgroup\$
1
\$\begingroup\$

Pip -x, 26 bytes

b@DFBFI(STB_+E8MJa)<>#TB#b

Takes a list and a string as command-line arguments. The string may need to be quoted twice, once for the shell and once for Pip. Or, add the -r flag and supply the list and string (as a double-quoted Pip string literal) on stdin. Attempt This Online!

Explanation

The -x flag evaluates both arguments. Then:

b@DFBFI(STB_+E8MJa)<>#TB#b
                 a          First argument (list of numbers)
               MJ           Map this function and join the results into one string:
           _                 The argument
            +E8              Plus 2 to the 8th power
         TB                  Converted to binary
        S                    Without the first character
                             I.e. convert to binary with leading zeros as necessary
       (          )<>       Group that string into chunks of this size:
                        #b   Length of second argument (encoder string)
                      TB     To binary
                     #       Length of that
     FI                     Filter out results that are falsey (here, equal to 0)
   FB                       Convert each remaining value from binary
  D                         Decrement (because Pip is 0-indexed)
b@                          Use as an index into b

The result is a list of characters, which by default is joined together and autoprinted.

\$\endgroup\$
1
\$\begingroup\$

Ignores the transformation process and just decodes

Burlesque, 45 bytes

id[-sab2L[PpqsvZ]vv{b28'0P[}\mpPco)b2:nzqgv\m

Try it online!

id           # Indices
[-           # Drop first
sa           # Non-destructive length
b2L[Pp       # Store length of longest binary in temp stack
qsvZ]        # Zip with ids and save to global map
vv           # Drop empty list
{b28'0P[}\m  # Map to binary, pad to 8, concat
pPco         # Get from temp stack, chunks of
)b2:nz       # Map from binary, filter for nonzero
qgv\m        # Get each value, concat

Follows the transformation process

Burlesque, 61 bytes

id[-)b2J)L[>]Js1{'0P[}j+]m[qsvZ]vv{b28'0P[}\mg1cof{b2nz}qgv\m

Try it online!

Pretty sure this has some golfing to be done further, but it works.

id      # Get indices of string
[-      # Drop 0 (string ends with null so this is ok)
)b2     # Map to binary
J)L[>]  # Get length of longest
Js1     # Save copy
{
 '0P[   # Pad left with 0s
}
j+]     # Append length of longest
m[      # Apply to each
qsvZ]   # Zip binary with encoder and save to global map
vv      # Drop resulting empty list (encoded string now top of stack)
{
 b2     # To binary
 8'0P[  # Pad to 8 with zeros  
}\m     # Map concat
g1      # Get max length
co      # Chunks of
f{      # Filter
 b2nz   # Decimal value of is non-zero
}
qgv\m   # Get vals from global map and concat
\$\endgroup\$
1
\$\begingroup\$

GolfScript, 54 53 47 46 bytes

~{256|{2base}:b~1>}%[]*\:a,b,/{b.{(a=}and}%""+

Try it online!

This is my second GolfScript submission, so I might be able to improve it. I haven't decided on the best way to write explanations for these so this time I'll leave a commented version:

~      # Evaluate the input; the codepoint list is on top of the stack.
{      # Open a block (like anonymous function) to run on each codepoint:
    256|     # Bitwise or with 256 to set the ninth bit
    {2base}  # A block which calls `2 base` (convert to/from base 2).
    :b       # Assign the block to variable b.
    ~        # and call it on the codepoint.
    1>       # Slice from index 1 to the end, removing the extra 9 bit.
}%     # Close this block and call it on each codepoint (map).
[]*    # Flatten the list by joining it on nothing.
\:a    # Swap and assign the second input,
       # the encoder alphabet, to variable a.
,      # Push the length of a.
b      # Convert it to binary.
,      # The length of the digit list.
/      # Split the codepoint list into chunks of this size.
{      # Open a block to run on each chunk:
    b.       # Convert the chunk from binary to decimal.
    {        # A block taking a 1-based index:
        (a=        # Decrement and 0-based index into a.
    }        # This returns a number, so we have to coerce to string later.
    and      # If the number isn't zero, run the block.
}%     # Run this block on each chunk (map).
""+    # Append an empty string to coerce the list.
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.