24
\$\begingroup\$

Challenge:

Your challenge is to write an interpreter for Whitespace. Given a string consisting of spaces, tabs, newlines, and potentially other characters, as well as possible inputs for the Whitespace program itself, output the result of the given Whitespace program.

Here is an overview of the Whitespace language and its builtins:

Whitespace is a stack-based language which uses only three characters: spaces (ASCII codepoint 32); tabs (ASCII codepoint 9); and newlines (ASCII codepoint 10); all other characters are ignored.
It only has a couple of basic builtins, which I will go over below. Whitespace has a stack, which can only consist of integers, as well as a heap, which is a map of integers (both the key and value).

From here on out I will be using S for spaces, T for tabs, and N for newlines to make the text more compact.

Stack Manipulation (always starts with leading S):

  • Push a number to the stack: SS, followed by either an S/T for positive/negative respectively, followed by some S and/or T which is the binary representation of the number (S=0; T=1), followed by a trailing newline N. Some examples:
    • SSSTN pushes the number 1; a positive integer with binary 1.
    • SSTTSN pushes the number -2; a negative integer with binary 10.
    • SSSTSTSN pushes the number 10; a positive integer with binary 1010.
    • SSTTTSSTSSN pushes the number -100; a negative integer with binary 1100100.
    • Pushing number 0 is an edge case, since it can be done in multiple ways. Some examples:
      • SSSN: push a positive integer without any binary digits.
      • SSTN: push a negative integer without any binary digits.
      • SSSSN: push a positive integer with binary 0.
      • SSTSSSN: push a negative integer with binary 000.
  • Duplicate the top of the stack: SNS.
  • Copy the 0-based \$n\$th item from the top of the stack to the top of the stack: STS followed by a number similar as mentioned earlier (excluding the leading SS). I.e. let's say the stack currently contains the integers [47,12,0,55], then we could use STSSTSN to copy the 0-based 2nd item (which is the 12 in this case) to the top. So the stack becomes: [47,12,0,55,12].
    • NOTE: This index may not be negative. On TIO this would result in a negative index error, but that same program would push a 0 in the vii5ard interpreter, and it could even be different in yet another interpreter. For the sake of this challenge, you can therefore assume a given copy will never be negative. So the copy will always start with STSS, followed by the binary of the top-to-bottom index, followed by a trailing N.
  • Swap the top two items on the stack: SNT.
  • Discard the top item of the stack: SNN.
  • Discard/slice \$n\$ items from the top of the stack, but keep the top item: STN, followed by a number similar as mentioned earlier (excluding the leading SS). I.e. let's say the stack currently contains the integers [1,2,3,4,5,6,7], then we could use STNSTTN to discard 3 items from the stack (except the top one). So the stack becomes: [1,2,3,7].
    • You can again assume no negative slice values will be used, so this will always start with STNS, followed by the amount to slice, followed by a trailing N.

Arithmetic (always starts with leading TS):

  • Addition; add the top two items on the stack together: TSSS.
  • Subtraction; subtract the top item of the stack from the (top-1)th item of the stack: TSST.
  • Multiplication; multiply the top two items on the stack: TSSN.
  • Integer division; integer divide the (top-1)th item of the stack by the top item of the stack: TSTS. (NOTE: Since Whitespace only has integers, this will always be integer division and never result in decimal values.)
  • Modulo; take the modulo of the (top-1)th item of the stack with the top item of the stack: TSTT.
    • You can assume no negative values will be used for the modulo.

For each of these two argument builtins the same applies: if none or only a single integer is on the stack, this will result in an error. How you implement this error in your interpreter is up to you, as long as the program stops when this occurs. I thought about not allowing it for the sake of this challenge, but decided not to because it's a commonly used strategy to stop a program with an error when printing hardcoded texts, using the approach explained in this Whitespace codegolfing tip.

Heap Access (always starts with leading TT):

  • Pop the top two items of the stack, and store the top item in the (top-1)th address of the heap: TTS. I.e. let's say the stack contains the integers [1,2,3,4,5] and the heap already contains [{2:10}]. When you use this store builtin twice in a row, the stack would contain [1] and the heap will contain [{2:3},{4:5}] (note how the {2:10} has been replaced with the {2:3}).
    • NOTE: Just like with the Arithmetic builtins, if no or a single argument is given, it will cause an error. But for the sake of this challenge you can assume this will never be given for this builtin.
  • Pop the top item of the stack, and push the item corresponding with that heap address to the top of the stack: TTT. I.e. let's say the stack contains the integers [1,4] and the heap contains [{2:3},{4:5}]. If you now use the retrieve builtin once, the stack would become [1,5] (and the heap will remain the same).
    • NOTE: If you use an address that isn't in the heap yet (or the heap is empty), it will push a 0 to the stack instead. But for the sake of this challenge you can also ignore this.

Flow Control (always starts with leading N):

  • Mark a location in the program with a label: NSS, followed by some (optional) S/T which aren't used by other labels/subroutines, followed by an N. I.e. if you're only using a single label in your full program, NSSN would be what to use when code-golfing. If you need two or three labels, you can add NSSSN and/or NSSTN.
    • Although it is possible to have multiple of the same labels in the TIO and vii5args interpreters, it will cause issues, so we assume the input will always only create a label/subroutine once.
    • Also, although NSSN would be a logical first label to use, it's completely valid to use a label NSSTSTSTTTSN instead as only label in the program.
  • Call a subroutine with the given label: NST, followed by some (optional) S/T which aren't used by other labels/subroutines, followed by an N. I.e. NSTTSTSTTTSN would jump to the label TSTSTTTS as subroutine.
  • Jump unconditionally to a label: NSN, followed by some (optional) S/T, followed by an N. I.e. NSNN would jump to the (empty) label N and continue the program flow from there.
  • Pop the top integer, and jump to a label if it is exactly 0: NTS, followed by some (optional) S/T, followed by an N. I.e. if the stack is currently [4,1,0] and we'd use NTSSN, it would jump to the label SN and continue the program flow from there (with stack [4,1]). If instead the stack is currently [4,1] and we'd use the NTSSN, it would jump past it to the next builtin below it (with stack [4]).
  • Pop the top integer, and jump to a label if it is negative: NTT, followed by some (optional) S/T, followed by an N. I.e. if the stack is currently [4,1,-10] and we'd use NTTTN, it would jump to the label TN and continue the program flow from there (with stack [4,1]). If instead the stack is currently [4,1] and we'd use the NTTTN, it would jump past it to the next builtin below it (with stack [4]).
    • Minor note: There is no Jump to label if positive builtin available in Whitespace.
  • End a subroutine, and go back to the caller (a.k.a. return): NTN.
  • End the entire program: NNN (everything after that becomes no-ops).

I/O (always starts with leading TN):

  • Pop the top integer, and print as character with that codepoint to STDOUT: TNSS. I.e. if the stack is currently [10,101] and we'd call the TNSS twice, it will output a lowercase e followed by a newline to STDOUT.
  • Pop the top integer, and print as integer to STDOUT: TNST.
  • Pop the top integer, and read a character from STDIN, for which its codepoint-integer will be stored in the heap with the popped integer as address: TNTS. I.e. if the stack is [0,0], the heap is empty, and STDIN contains the capital letter I, and we'd use TNTS. The stack will become [0] and the heap [{0:73}]. (After which we could use the retrieve builtin TTT to put this input on the stack.)
  • Pop the top integer, and read an integer from STDIN, which will be stored in the heap with the popped integer as address: TNTT.

Challenge rules:

  • You can assume the Whitespace input is always valid with the builtins above. So the compilation phase would always succeed.
  • You can assume executing the Whitespace input will never result in any errors, except when the Arithmetic builtins will only get 0 or 1 stack-arguments instead of the required 2. Although there are loads of other possible errors when executing a Whitespace program, like negative indices, jumps to labels that doesn't exist, read from STDIN when it's empty, etc. For the sake of this challenge you can assume none of those kind of errors will occur, and the input-program is error-free (except for the Arithmetic builtins).
  • You can assume the additional inputs given will be valid as well. So an integer when we want to read an integer or a character / string of multiple characters if we want to read those.
  • The Whitespace program-input can be taken in any reasonable format. Could be a string, list of characters, list of codepoint-integers, etc. Same applies to the other inputs.
  • Any non-whitespace character in the Whitespace-program input is ignored. You can assume the no-op characters will only be printable ASCII. So the Whitespace input will only contain the UTF-8/ASCII characters with the codepoints [9, 10, 32..126].
  • Whitespace numbers can in theory be as large as you want, but for the sake of this challenge you can assume the integers used will be in the range [-9223372036854775808, 9223372036854775807] (signed 64 bit integer).

General rules:

  • This is , so shortest answer in bytes wins.
    Don't let code-golf languages discourage you from posting answers with non-codegolfing languages. Try to come up with an as short as possible answer for 'any' programming language.
  • Standard rules apply for your answer with default I/O rules, so you are allowed to use STDIN/STDOUT, functions/method with the proper parameters and return-type, full programs. Your call.
  • Default Loopholes are forbidden.
  • If possible, please add a link with a test for your code (i.e. TIO).
  • Also, adding an explanation for your answer is highly recommended.

Test cases:

You should copy the actual test cases from the Try it online links to verify your interpreter.

Input(s):
 SSTTSTTTTNSSSTSNSSSTNSTSSTNSSTTSTNSSTTSNSNSSSSTNSTSSTNSSTTSSTTTSNSSTTTSSTTNSSSNSSTTSTNSSTTTNSSSTSNSSTTNSSSTTTNSSSTSNSSTTSSTTTSNSSSTSNSSTTNSSSTTTNSSTTSNSSSTSNSSTTSSTTTSNSSTTTSSTTNSSTTTNSSTTSNSSTTSTNSSTTNSSTTSSTTTSNSSTTNSSSTTTNSSTTSTNSSSNSSTTSTNSSTTSNSNSSSSTNSSTTTTTSNNSSNSSSTTSTTTSNTSSSTNSSNSNN
Builtins used:
 Push num; duplicate; copy; add; create label; jump to label; print as char
Output:
 Pollinium milk; plump pumpkin; lollipop?

Try it online as actual program using raw spaces, tabs, newlines.
See this Whitespace answer of mine for an explanation.

Input(s):
 SSTTSNSNSTSSNTNST
Builtins used:
 Push num; duplicate; multiply; print as int
Output:
 4

Try it online as actual program using spaces, tabs, newlines, and comments.

Input(s):
 SSTTNTNSTNNNTSNTNTSSS
Builtins used:
 Push num; print as int; stop program; (no-ops)
Output:
 -1

Try it online as actual program using raw spaces, tabs, newlines.
See this Whitespace answer of mine for an explanation.

Input(s):
 SSSNSNSSNSTNTTTTTNSSNSNSTNSTSSSTSTSNTNSSSNTSSSTNTSSSSTSSTNSTSSTNSNSSSSTSSNTSTTSNSNTSSNSSSTNTSSTSNSNTSTNSSSTNTSSTNTSSSNSNTSTSSTNTSSTNSNNNSSSNSNNTSTSSTSSTSNSTSSTSNTSTTNTSNNNNNSSTNSNNTSSNNSNNNSSSSNTSSSNSNN
 21
Builtins used:
 Push num; duplicate; swap; copy; discard; add; subtract; multiply; divide; modulo; retrieve; create label; jump to label; jump to label if 0; stop program; read as int; print as char; print as int
Output:
 21
 21
 23
 20
 5
 25
 31
 24
 3
 27
 37
 26

Try it online as actual program using spaces, tabs, newlines, and comments.
See this Whitespace answer of mine for an explanation.

Input(s):
 NSSNSSSNSNSTNTSTTTSNSSSSTSTSNTSSTNTSSNSSSNSNSTNTSTTTTSSTNTSNSSSNTNSTNSSSN
 TThhiiss  iiss  ddoouubblee  ssppeeaakk!!\n
Builtins used:
 Push num; duplicate; subtract; retrieve; create label; jump to label if 0; read as char; read as int
Output:
 0

Try it online as actual program using spaces, tabs, newlines, and comments.
See this Whitespace answer of mine for an explanation.

Inputs(s):
 SSSNSNSTNTTTTTSSSNSNSTNTTTTTTSSSTNST
 -3
 5
Builtins used:
 Push num; duplicate; add; retrieve; read as int; print as int
Output:
 2

Try it online as actual program using spaces, tabs, newlines, and comments.
See this Whitespace answer of mine for an explanation.

Input(s):
 SSSTSSSTTTTTSSSSSSSTTSSSTSTTTTTSSTTSNTNST
Builtins used:
 Push num; print as int
Output:
 4815162342

Try it online as actual program using spaces, tabs, newlines, and comments.
See this Whitespace answer of mine for an explanation.

Input(s):
 SSSTTTNTNTSSSSTSSSSTNSSSTSTSNNSSTTNSSSTNTSSTSNSNTTSNSTSSTNSNTNSNTTNNSSSNSSSTTTNTTTSTNSSSTSTNTNSTSSSTSNNSSTNSNTSNNSNTTNSSSSSTSNTSTSSNSNTSSSNNSNTNNSSSSNNNN
 〹
Builtins used:
 Push num; duplicate; swap; discard; copy; slice; retrieve; create label; jump to label; jump to label if 0; jump to label if negative; exit program; read as char; print as char; print as int
Output:
 12345!!

Try it online as actual program using spaces, tabs, newlines, and comments.

Inputs(s):
 SSSTTNSNSSNSSNSTTSSSSTTTNTTSNSTNTNSSNNNSSTNNSSNTTTNTNSSTN
Builtins used:
 Push num; duplicate; store; retrieve; call subroutine; return; exit program; print as char; (no-ops)
Output:
   (character with unicode value 7)

Try it online as actual program using spaces, tabs, newlines, and comments.

\$\endgroup\$
4
  • \$\begingroup\$ Can the heap have a fixed size? If so, what's an acceptable size for it? \$\endgroup\$
    – DLosc
    Commented Nov 11, 2021 at 1:11
  • \$\begingroup\$ @DLosc Sure. And let's say at least 1000 (or perhaps 1024)? If you have another reasonable alternative feel free to suggest it. I've been codegolfing in Whitespace a few years now and overall I never saved too many stuff in the heap. Although tbf, with codegolfing you usually just overwrite the initial heap value since pushing address 0 is just the shortest, and in most cases you don't need to re-use the heap too much. \$\endgroup\$ Commented Nov 11, 2021 at 7:35
  • \$\begingroup\$ Form feeds are white space. \$\endgroup\$
    – user230118
    Commented Jul 13, 2022 at 20:58
  • \$\begingroup\$ @user230118 So are carriage returns, zero-width spaces, etc. 🤷 I'm not the one who named the language. Either way, those can all be ignored like any other character, except for the three whitespace characters (newline, tab, space) that are mentioned. \$\endgroup\$ Commented Jul 14, 2022 at 6:41

3 Answers 3

21
+500
\$\begingroup\$

Vim, 2366 bytes

:let@z=''
:%s/\S//eg
:%s/ /S/&
:%s/    /T/&
:%s/\n/N/&
$xo"Zyl:if"<C-v><C-r>z"=="SS"||"<C-v><C-r>z"=="STS"||"<C-v><C-r>z"=="STN"||"<C-v><C-r>z"=="NST"||"<C-v><C-r>z"=="NSN"||"<C-v><C-r>z"=="NTS"||"<C-v><C-r>z"=="NTT"<C-v>
norm fN<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="SNS"||"<C-v><C-r>z"=="SNT"||"<C-v><C-r>z"=="SNN"||"<C-v><C-r>z"=="TSSS"||"<C-v><C-r>z"=="TSST"||"<C-v><C-r>z"=="TSSN"||"<C-v><C-r>z"=="TSTS"||"<C-v><C-r>z"=="TSTT"||"<C-v><C-r>z"=="TTS"||"<C-v><C-r>z"=="TTT"||"<C-v><C-r>z"=="NTN"||"<C-v><C-r>z"=="TNSS"||"<C-v><C-r>z"=="TNST"||"<C-v><C-r>z"=="TNTS"||"<C-v><C-r>z"=="TNTT"||"<C-v><C-r>z"=="NNN"<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="NSS"<C-v>
norm hhxxrmfN<C-v>
let@z=''<C-v>
en<C-v>
l@t<Esc>^"tC"Zyl:if"<C-v><C-r>z"=="SS"<C-v>
norm @a<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="STS"<C-v>
norm @b<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="STN"<C-v>
norm @c<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="SNS"<C-v>
norm @d<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="SNT"<C-v>
norm @e<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="SNN"<C-v>
norm @f<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TSSS"<C-v>
norm @g<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TSST"<C-v>
norm @h<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TSSN"<C-v>
norm @i<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TSTS"<C-v>
norm @j<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TSTT"<C-v>
norm @k<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TTS"<C-v>
norm @l<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TTT"<C-v>
norm @m<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="NST"<C-v>
norm @o<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="NSN"<C-v>
norm @p<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="NTS"<C-v>
norm @q<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="NTT"<C-v>
norm @r<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="NTN"<C-v>
norm @s<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TNSS"<C-v>
norm @u<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TNST"<C-v>
norm @v<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TNTS"<C-v>
norm @w<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="TNTT"<C-v>
norm @x<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="m"<C-v>
norm fNl<C-v>
let@z=''<C-v>
elsei"<C-v><C-r>z"=="NNN"<C-v>
k<C-v>
el<C-v>
norm l<C-v>
en<C-v>
@n<Esc>^"nClylmc'sO<C-v><Esc>p:s/T/-/e<C-v>
:s/S/+/e<C-v>
"9xms`cytN'sp^x@y^"9PC<C-v><C-r>=<C-v><C-r>"<C-v>
<C-v><Esc>ms`cfNl<Esc>^"aCllytNmc'sO<C-v><Esc>p@y<C-v><C-a>^Dms@"jy$'spms`cfNl<Esc>^"bCllytNmc'so<C-v><Esc>p@y^DJ@"dd`cfNl<Esc>^"cCmc'sy$O<C-v><C-r>"<C-v><Esc>ms`cl<Esc>^"dCmc'sddpkms`cl<Esc>^"eCmc'sddms`cl<Esc>^"fCmc'sy$ddmsC<C-v><C-r>=<C-v><C-r>0+<C-v><C-r>"<C-v>
<C-v><Esc>`cl<Esc>^"gCmc'sy$ddmsC<C-v><C-r>=-<C-v><C-r>0+<C-v><C-r>"<C-v>
<C-v><Esc>`cl<Esc>^"hCmc'sy$ddmsC<C-v><C-r>=<C-v><C-r>"*<C-v><C-r>0<C-v>
<C-v><Esc>`cl<Esc>^"iCmc'sy$ddmsC<C-v><C-r>=<C-v><C-r>"/<C-v><C-r>0<C-v>
<C-v><Esc>`cl<Esc>^"jCmc'sy$ddmsC<C-v><C-r>=<C-v><C-r>"%<C-v><C-r>0<C-v>
<C-v><Esc>`cl<Esc>^"kCmc'sy$ddDJms'h:s/ <C-v><C-r>"(\d*)//eg<C-v>
A <C-v><C-r>"(<C-v><C-r>0)<C-v><Esc>`cl<Esc>^"lCmc'sy$'h/ <C-v><C-r>"(<C-v>
%yT('sC<C-v><C-r>0<C-v><Esc>`cl<Esc>^"mCfN:let@9=col('.')+1<C-v>
vTNy/m<C-v><C-r>"<C-v>
fNlmc'xO<C-v><Esc>"9pmx`c<Esc>^"oCfNvTNy/m<C-v><C-r>"<C-v>
fNl<Esc>^"pCmc'sDJ`c:if<C-v><C-r>"==0<C-v>
norm fNvTNlly/m<C-v><C-v><C-v><C-r>"<C-v><C-v><C-v>
fNl<C-v>
el<C-v>
norm fNl<C-v>
en<C-v>
<Esc>^"qC
mc'sDJ`c:if<C-v><C-r>"<0<C-v>
norm fNvTNlly/m<C-v><C-v><C-v><C-r>"<C-v><C-v><C-v>
fNl<C-v>
el<C-v>
norm fNl<C-v>
en<C-v>
<Esc>^"rCmc'xDJmx`c@"|<Esc>^"sCmc'sDJms`oA<C-v><C-r>=nr2char(<C-v><C-r>-)<C-v>
<C-v><Esc>mo`cl<Esc>^"uCmc'sDJms`o$"-pmo`cl<Esc>^"vCmc'iDJ'sO<C-v><C-r>=char2nr('<C-v><C-r>"')<C-v>
<C-v><Esc>ms`c@l<Esc>^"wCmc'iDJ'sO<C-v><C-r>"<C-v><Esc>ms`c@l<Esc>^"xC0i0<C-v><Esc>y$@=len("<C-v><C-r>"")<C-v>
I(<C-v><Esc>^x:s/S/*2)/eg<C-v>
:s/T/*2+1)/&<C-v>
^C<C-v><C-r>=<C-v><C-r>"<C-v>
<C-v><Esc><Esc>^"yDo<Esc>mxo
Stack:
<Esc>mso
Heap:
<Esc>mho
Input:
<Esc>mio
Output:
<Esc>mo:let@z=''

Try it online!

I sure do love me a thicc Vim program.

This program replaces all of the spaces, tabs, and newlines with the letters STN. This is only necessary for the newlines, because Vim is primarily a text editor and treats them differently, but it does it to each character for consistency. It then takes each Whitespace command and puts it into its own macro. Here's a list of all of the macros, and what command they map to. After creating all of the macros, it goes through a compilation phase, followed by an execution phase.

The compilation phase goes through the program, and for every NSS command, it replaces the NSS with an m, to be able to easily jump to it later.

The execution phase collects characters into the "z register, and once the register matches a command, the macro for that command is executed, and the "z register is cleared.

The stack is simply a bunch of lines containing a number, with the top of the stack having the label s. When we want to access the top of the stack, we just jump to the label, then we can use the c label to jump back to the code.

The heap is a line that stores many values in the form key(value). To set a value, the program deletes the matching key/value pair if it exists, then adds the key/value to the end of the heap. To get the value, it just does / key(, which finds the key, then it uses %yT( to copy only the value.

Input can be passed to the program by adding `ii<input> to the beginning of the footer. Each value must be on a separate line to work: Try it online! The gg@tgg@n at the end of the footer compiles and runs the program. To view only the output of the Whitespace program, add `idH to the end of the footer.

\$\endgroup\$
12
\$\begingroup\$

JavaScript (Node.js),  608 597 586  551 bytes

Takes input as (source)(input), where input is an array containing single characters and/or integers.

s=>I=>eval([..."@CDEJKLMNQTUWYZ^_bkmqw"].reduce((s,c)=>(a=s.split(c)).join(a.pop()),"(R=X=>(H={},o=P=[],S=[],z=x=p=i=0,gUs[p]?~(j=` 	\n`.indexOf(s[p++]N?j:gK:3,hUgK<2?Mn*2+j):V=n,GUx=z--&&S.pop(y=x),FUg(x=`mbQQ$mS[z+~(b])TSCce((z-=V=b-1,V)$mkJLmyJkTTLH[x]=y$mHD)qR[M1)]=_?R[P@p),V]:_Y_&!kY_&k<0Yp$E?P.popK:pT$^QQQQQqW+yw-yw*yw/y|0w%yN:^QTTo+=Buffer(D)$o+=k$Z.codePointAtKZ`Ct`$`[n-9],x&&eval(x,n=1N<3&&F(n*3+jN(1N(0)||R(1)||owN:^$WqTT$mz=S@kGKbgK?-M0):M0N_p$M1),E^E?g:pZ$HD=I[i++]Y?R[V]:Wz>1?(LmxU=n=>T$$QqqN))Mh(Lk,k,K()J),mx)$Ep=XD[k][email protected]"))

Try it online! (all test cases without comments)

Try it online! (an example with a commented source code as input)

How?

Because the code is packed with RegPack, it is more efficient to repeat the same pieces of code over and over rather than defining too many helper functions.

The sequences of spaces, tabs and linefeeds are converted to an integer in base 3 with a leading \$1\$ until they match a known value:

 seq. | mnemonic | base 3 | decimal
------+----------+--------+---------
 SS   |    PSH   |    100 |     9
 STS  |    CPY   |   1010 |    30
 STN  |    SLI   |   1012 |    32
 SNS  |    DUP   |   1020 |    33
 SNT  |    SWP   |   1021 |    34
 SNN  |    DIS   |   1022 |    35
------+----------+--------+---------
 TTS  |    PUT   |   1110 |    39
 TTT  |    GET   |   1111 |    40
------+----------+--------+---------
 NSS  |    LBL   |   1200 |    45
 NST  |    JSR   |   1201 |    46
 NSN  |    JMP   |   1202 |    47
 NTS  |    JZE   |   1210 |    48
 NTT  |    JMI   |   1211 |    49
 NTN  |    RTN   |   1212 |    50
 NNN  |    END   |   1222 |    53
------+----------+--------+---------
 TSSS |    ADD   |  11000 |   108
 TSST |    SUB   |  11001 |   109
 TSSN |    MUL   |  11002 |   110
 TSTS |    DIV   |  11010 |   111
 TSTT |    MOD   |  11011 |   112
------+----------+--------+---------
 TNSS |    CHR   |  11200 |   126
 TNST |    INT   |  11201 |   127
 TNTS |    RCH   |  11210 |   129
 TNTT |    RNU   |  11211 |   130

The commands are simply stored in a lookup table and executed with eval().

Everything is wrapped within the function \$R\$ which is called twice:

  • First pass (\$X=0\$): the jumps and the stack errors are ignored so that the whole code is parsed linearly and all labels are stored
  • Second pass (\$X=1\$): the code is executed normally

Unpacked and formatted

s => I => (
  R = X => (
    H = {},
    o = P = [],
    S = [],
    z = x = p = i = 0,
    g = n => s[p] ? ~(j = ` \t\n`.indexOf(s[p++])) ? j : g() : 3,
    h = n => g() < 2 ? h(n * 2 + j) : V = n,
    G = n => x = z-- && S.pop(y = x),
    F = n => g(
      x = [
        /* PSH */ "z=S.push(g()?-h(0):h(0))",,,,,,,,,,,,,,,,,,,,,
        /* CPY */ "z=S.push(S[z+~(g()?-h(0):h(0))])",,
        /* SLI */ "S.splice((z-=V=g()?-h(0):h(0))-1,V)",
        /* DUP */ "z=S.push(G()),z=S.push(x)",
        /* SWP */ "G(),G(),z=S.push(y),z=S.push(x)",
        /* DIS */ "G()",,,,

        /* PUT */ "G(),G(),H[x]=y",
        /* GET */ "z=S.push(H[G()])",,,,,

        /* LBL */ "R[h(1)]=p",
        /* JSR */ "h(1),p=X?R[P.push(p),V]:p",
        /* JMP */ "h(1),p=X?R[V]:p",
        /* JZE */ "h(1),p=X&!G()?R[V]:p",
        /* JMI */ "h(1),p=X&G()<0?R[V]:p",
        /* RTN */ "p=X?P.pop():p",,,
        /* END */ "p=X?g:p",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

        /* ADD */ "z>1?(G(),G(),z=S.push(x+y)):p=X?g:p",
        /* SUB */ "z>1?(G(),G(),z=S.push(x-y)):p=X?g:p",
        /* MUL */ "z>1?(G(),G(),z=S.push(x*y)):p=X?g:p",
        /* DIV */ "z>1?(G(),G(),z=S.push(x/y|0)):p=X?g:p",
        /* MOD */ "z>1?(G(),G(),z=S.push(x%y)):p=X?g:p",,,,,,,,,,,,,,

        /* CHR */ "o+=Buffer([G()])",
        /* INT */ "o+=G()",,
        /* RCH */ "H[G()]=I[i++].codePointAt()",
        /* RNU */ "H[G()]=I[i++]"
      ][n - 9],
      x && eval(x, n = 1)
    ) < 3 && F(n * 3 + j)
  )(1)
)(0) || R(1) || o
\$\endgroup\$
1
  • \$\begingroup\$ This is awesome :D \$\endgroup\$
    – RGS
    Commented Feb 2, 2020 at 23:27
5
+50
\$\begingroup\$

Pip Classic, 441 399 377 bytes

,:mn:@>gs:lt:v**@_*FBTM_a:(Ja@wTR" 	
"012`(00|01.|20.|21[01]).*?2|1[02]..|...`)M{!a@<2?"lPU(t".a@>2.')a@<3=10?"lPUl@(t".a@>3.')a@<3=12?"l:@lALl@>++(t".a@>3.')2=@a?("7080I!370I3<z70i:POs00sPUi7"^0a@<3%13).a@>3a=20?"lPU@l"a=21?"S@ll@o"a=22?3a=110?"Y36"a=111?"lPUm@3"a@1?("O C30O30Y APO@60YPO6"^0a)98J['*x"//"'%'+'-]@a}Wi<#a{Va@iR9"Y3l?lPU3"R8"y;Vk"R7"i:a@?"R6"nm@3:y"R3C`POl`++i}

Try it online!

Takes the Whitespace code as its first command-line argument, and the arguments to the Whitespace program as the following command-line arguments. (A string of characters should be given as a single argument.) The heap has a fixed size of 1000 entries.

How?

The broad-strokes version:

  • Strip non-whitespace characters from code
  • Transliterate space, tab, and newline to 0, 1, and 2
  • Use regex to break the program into a list of valid statements
  • Translate each statement into Pip code
  • While the program counter (initially set to 0) is less than the length of the program, execute the statement at that index and increment the program counter

When the program needs to halt, either because the Halt instruction was executed or because there were not enough operands for an arithmetic operation, it does so by throwing an error using the idiom Vk (explained in this tip).

Ungolfed & commented

Here's the ungolfed version that I started from. Note: this is written in modern Pip, so it won't work on TIO. I've restructured the golfed version a bit, particularly to compress some of the cases in the translation section, but the basic approach is the same.

;;;;;;;;;;;
;; Setup ;;
;;;;;;;;;;;

; Use S/T/N temporarily for ease of development
$codepage : "STN"

; Stack starts out empty
$stack : []
; Heap size is 1000 (initial values don't matter)
$heap : u RL 1000

; Whitespace code is first command-line argument
$code : a
; Program inputs are remaining command-line args
$inputs : g @> 1
; Program counter starts at 0
$pc : 0

; To handle subroutines, we need to store a stack of return-to line numbers
$return_addrs : []

; Function to decode number literals
$to_int : {
  $magnitude : FB TM a
  $sign : (-1) E @a
  $sign * $magnitude
}

;;;;;;;;;;;;;
;; Parsing ;;
;;;;;;;;;;;;;

; Remove characters from code that aren't in codepage
$code FI: _ N $codepage
; Transliterate code from codepage to 0 (space), 1 (tab), and 2 (newline)
$code TR: $codepage 012

; Split code into separate instructions
$command_rgx : `(00|01.|20.|21[01]).*?2|1[02]..|...`
P $code : (J $code) @ $command_rgx

; Translate each instruction into Pip commands
$program : $code M {
  $instr : a
  ; Push a number
  $instr @< 2 Q 00 ? "$stack PU " . ($to_int $instr @> 2)
  ; Copy nth number on stack
  $instr @< 3 Q 010 ? "$stack PU $stack @ " . ($to_int $instr @> 3)
  ; Slide n numbers off from under top of stack
  $instr @< 3 Q 012 ? "$stack : $stack @ 0 AL $stack @> " . 1 + ($to_int $instr @> 3)
  ; Label (no-op)
  $instr @< 3 Q 200 ? $instr @> 3
  ; Gosub label
  $instr @< 3 Q 201 ? "$return_addrs PU $pc; $pc : $program @? " . $instr @> 3
  ; Goto label
  $instr @< 3 Q 202 ? "$pc : $program @? " . $instr @> 3
  ; If top of stack is zero, goto label
  $instr @< 3 Q 210 ? "I PO $stack = 0 $pc : $program @? " . $instr @> 3
  ; If top of stack is negative, goto label
  $instr @< 3 Q 211 ? "I PO $stack < 0 $pc : $program @? " . $instr @> 3
  ; Dup
  $instr Q 020 ? "$stack PU $stack @ 0"
  ; Swap
  $instr Q 021 ? "$stack @ 0 :: $stack @ 1"
  ; Drop
  $instr Q 022 ? "PO $stack"
  ; Add
  $instr Q 1000 ? "Y PO $stack; $stack ? $stack PU (PO $stack) + y; V k"
  ; Subtract
  $instr Q 1001 ? "Y PO $stack; $stack ? $stack PU (PO $stack) - y; V k"
  ; Multiply
  $instr Q 1002 ? "Y PO $stack; $stack ? $stack PU (PO $stack) * y; V k"
  ; Divide
  $instr Q 1010 ? "Y PO $stack; $stack ? $stack PU (PO $stack) // y; V k"
  ; Modulo
  $instr Q 1011 ? "Y PO $stack; $stack ? $stack PU (PO $stack) % y; V k"
  ; Store top of stack on heap at second-on-stack address
  $instr Q 110 ? "Y PO $stack; $addr : PO $stack; $heap @ $addr : y"
  ; Recall number from heap at top-of-stack address
  $instr Q 111 ? "$stack PU $heap @ PO $stack"
  ; Print as char
  $instr Q 1200 ? "O C PO $stack"
  ; Print as number
  $instr Q 1201 ? "O PO $stack"
  ; Read char and store it on heap at top-of-stack address
  $instr Q 1210 ? "Y A PO $inputs @ 0; $addr : PO $stack; $heap @ $addr : y"
  ; Read number and store it on heap at top-of-stack address
  $instr Q 1211 ? "Y PO $inputs; $addr : PO $stack; $heap @ $addr : y"
  ; Return from subroutine
  $instr Q 212 ? "$pc : PO $return_addrs"
  ; Halt
  $instr Q 222 ? "V k"
  ; Unrecognized instruction
  ""
}

P $program
P "------------"

;;;;;;;;;;;;;;;
;; Execution ;;
;;;;;;;;;;;;;;;

; Loop while the program counter hasn't gone past the end of the program
W $pc < # $program {
  ; Eval the current statement
  V $program @ $pc
  ; Increment the program counter
  U $pc
}
\$\endgroup\$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.