8
$\begingroup$

Concatenative languages compose subprograms (nonempty sequences of terms) by appending one to the other. Most of these languages use postfix notation, like in Forth:

3 4 swap add 5 mul

A few, like Om, use a prefix notation, where operators precede their arguments, though Om also has a novel semantics that I am not asking about here. We can imagine reversing the order of tokens in the program above:

mul 5 add swap 4 3

and interpreting them to produce exactly the same result, with operators now preceding the expressions producing their arguments. This prefix order is more analogous to most applicative functional languages, but the semantics of juxtaposition are different, indicating function composition instead of application.

In some ways, the postfix notation reflects the stack-mutation semantics of Forth, administered left-to-right, while a prefix notation more illustrates composition, but these are post-facto rationalisations.


Are there reasons to prefer one ordering to the other? I can see no technical reason for these not to be equivalent, but there may be ergonomic reasons. For example, is one more readily understood by the (non-)programmer, or more efficiently written? Does one somehow enable better library design? Will editing or debugging affordances be more available with one order? Or potentially other factors I haven't considered. There may not be a firm answer one way or the other, but I'd like evidence for when to use each order.

I would be particularly interested in results of any study examining this, if there is such a thing, but learned experience from developing or using such languages would also support an answer.

$\endgroup$

3 Answers 3

5
$\begingroup$

The natural implementations are different:

  • Postfix ordering requires a stack of only values, where operators are applied immediately, taking their operands off the stack. The execution loop for an interpreter is straightforward: read the next instruction from the program, and evaluate it.
  • Prefix ordering requires a stack which can contain both operators and values. An operator like add is pushed to the stack, and not evaluated until there are sufficiently many operands on the stack above it. The execution loop for an interpreter is different: read the next instruction from the program and push it to the stack; then while the top-most operator on the stack is followed by sufficiently many values for its operands, evaluate that operator.

This ignores how the implementation evaluates quotes, but that is covered in this other question.

It seems to me that the implementation is much simpler using postfix ordering, and this will be decisive for concatenative languages which aren't meant to be written by humans (e.g. PostScript).

There is also a semantic difference which favours postfix notation: in prefix notation, each operator must take a fixed number of operands, in order to determine when the operator should be evaluated. On the other hand, postfix notation allows an operator to accept a variable number of operands, e.g. by consuming the whole of the stack.


As discussed in the comments, either of these implementations could be used for either prefix or postfix order ─ you can just reverse the program's tokens to switch the ordering. However, this would give a language where the side-effects of instructions are evaluated in the opposite order to which those instructions are written. This seems undesirable to me, but in principle a language could be designed this way.

Alternatively, the language could be designed in such a way that instructions don't have side-effects, and instead a program's output is simply a value left on the stack when it terminates. Or for interactive programs, perhaps something like Haskell's IO monad could be used, so that the order that terms are evaluated in doesn't have to be the same as the order their side-effects are observed in.

So other implementations than the above two are possible, but I'll focus on these because they're straightforward, and have the desirable property that the program's side-effects are evaluated from left to right.


For concatenative languages meant to be written by humans, the ease of language implementation may no longer be the deciding factor.

The main ergonomic difference between these two approaches is what you already identified: the syntax places functions before their arguments, which is closer to how humans think about programming. A similar issue is that a variable name can precede its initialiser, which particularly matters when the initialiser is a long expression. Here's an example using my own toy concatenative language, fffff:

(
    n 3 % 0 = >three
    n 5 % 0 = >five
    ("fizz" print) three if
    ("buzz" print) five if
    (n print) three five or not if
    "" println
) >!print_num

This snippet declares a quote (i.e. a user-defined function) and binds it to the name print_num ─ but the name doesn't appear until after the end of the quote. This is a significant readability issue. With prefix notation instead, the name would appear at the start, which would be more sensible.

Another ergonomic issue is in debugging. If the program is paused during execution (e.g. due to a runtime error, or hitting a breakpoint), the programmer will be able to observe the current program state. As discussed above, prefix order and postfix order have natural implementations in which the program state would be a stack of operators and values, vs. a stack of just values, respectively. It's difficult to say which of these is more useful for debugging purposes, but a stack containing operators might be preferable since it more closely resembles a stack trace in an imperative or functional language, and may give better context about what the program was doing when it was paused.

$\endgroup$
15
  • 2
    $\begingroup$ I am intrigued by what you say about implementation differences: surely it's symmetric? The compiler can reverse the tokens to get either permutation for either input, so it ought to be able to choose implementation and surface order independently. $\endgroup$
    – Michael Homer
    Commented May 25, 2023 at 20:48
  • 1
    $\begingroup$ @MichaelHomer It can reverse the tokens and execute that way, if you don't mind your program's side effects being executed in reverse order. println "Hello" println "World" in prefix notation would be reversed into a postfix program which prints "World Hello". I expect most programmers would mind this! $\endgroup$
    – kaya3
    Commented May 25, 2023 at 20:51
  • $\begingroup$ That's the correct order the author put them in; if they wanted a different order, it's a bug in their program. The composition is println ("Hello" (println ("World"))) where one println is (theoretically) dependent on the results of the other. I guess this is envisaging a concatenation of top-level applicative calls, rather than a single program? $\endgroup$
    – Michael Homer
    Commented May 25, 2023 at 21:01
  • $\begingroup$ @MichaelHomer That makes little sense to me; is the string literal "Hello" a unary operator? What does it do with its operand? Why doesn't "World" likewise need an operand? $\endgroup$
    – kaya3
    Commented May 25, 2023 at 21:05
  • 1
    $\begingroup$ I clearly failed to communicate in the question and I'm too close to it to see why. I intended it to be explicitly about reversing the tokens, explicitly not about alternative semantic models, explicitly note the orders were equivalent, and explicitly asking for human-factors answers, so I was surprised it got a response the first half of which is in essence "obviously you're not actually going to reverse the tokens, so here's why the semantic model is different...", but that is presumably because I didn't clearly express it when I said those things. I don't think I can edit it. $\endgroup$
    – Michael Homer
    Commented May 26, 2023 at 21:08
4
$\begingroup$

This is probably a very subjective opinion, but your prefix notation doesn't look very readable compared to the postfix one.

The postfix notation has a clear "imperative" mental model: each word transforms some values on the top of the stack. So the example code 3 4 swap add 5 mul can be mentally followed on the lines of

3       [... 3]
4       [... 3 4]
swap    [... 4 3]
add     [... (4+3)]
5       [... (4+3) 5]
mul     [... ((4+3)*5)]

The prefix notation, with a restriction that every word takes some fixed number of values and returns exactly one value, is good ol' Polish notation, which allows for a Lisp-like, or an expression-evaluator-like, mental model: print mul 5 add 4 3 can be read as

(print
  (mul
    5
    (add
      4
      3
    )
  )
)

I can't recall any language that uses this notation other than an esolang Knight, but (in my limited experience using it in code golf) I didn't have much problem writing and reading Knight code syntax-wise, so I guess it is ergonomic enough. (Knight allows subexpressions to be wrapped with parens (), which IMO does encourage this mental model.)

However, when you allow more arbitrary stack transformations, this mental model breaks down. Take the reverse of the program 3 4 swap 5 mul add, which is add mul 5 swap 4 3. What are the arguments to mul? How about add? There is no easy way to track them, other than to return to the imperative model but reading words in reverse. Apparently people prefer to read code from left to right[citation needed], so this switch of direction may incur a cognitive overhead on the programmers when reading code in this style. This also means people will tend to write programs from right to left, which doesn't go well with common text editors.

$\endgroup$
1
  • $\begingroup$ Husk is a prefix concatenative language. It uses a higher-order function flip :: (x -> y -> z) -> y -> x -> z instead of swap. Of course, golfing languages are less focused on ergonomics. $\endgroup$
    – alephalpha
    Commented May 26, 2023 at 1:47
4
$\begingroup$

One ergonomic difference is for interactive REPLs.
I'm focusing here on linear teletype-style REPLs. I'll bring "evidence" from my REPL experience that is mostly non-concatenative, yet relevant to prefix vs. postfix.

It's very common in REPL usage to compute some values, see them, then feed them into more computation. You don't know the later code you'll write until you see the intermediate values, so inputs "must" appear before function.

An entirely postfix language like FORTH, Factor etc. supports this workflow naturally - you always write code in data-flow order, so you can press Enter stop at any point, inspect the stack non-destructively, then write more logic using previous results.

IN: scratchpad 3 4

--- Data stack:
3
4
IN: scratchpad +

--- Data stack:
7
IN: scratchpad 5 *

--- Data stack:
35

In prefix/infix languages like Python, the main workflows are:

  • Assign intermediate results to a variable, later refer to it by name.

    > sum = 3 + 4
    => 7
    > area = 5 * sum
    => 35
    

    If omitted, many REPLs assign names implicitly, e.g. _ in Python's & Ruby's builtin repl, numbered Out[7] in Mathematica & Jupyter etc.

    > 3 + 4
    => 7
    > 5 * _
    => 35
    

    Note how the resulting code has mixed order: data flows left "in the small", right/down "in the large". (almost all imperative languages do this! but still it's a conceptual complication.)

  • Re-run the whole command with extra logic added.
    Anecdotally, prepending code at start feels awkward! Not sure why exactly but appending | sort | uniq at the end in shell, or some .method_call or |> function pipelining syntax feels better.
    I don't have any data why. Is it merely an artifact of Up history recall leaving cursor at the end by default? Is it because having stopped and thought about the new code significantly later it's natural to write "in order of thinking"? Is it because the repeated portion of the command is less interesting, you want to see the new added part near the output?

    $ expr 3 + 4
    7
    $ expr 3 + 4 | expr $(cat) \* 5   # bad example, expr doens't read stdin
    35
    $ expr 3 + 4 | jq '. * 5'
    35
    

Counter-points, limiting the usefulness of linear-transcript REPL with postfix data-stack model:

  • It's hard to push things on the stack in the order that will be most convenient to consume by later logic, esp. if you don't know that logic yet; you might need to reshuffle or put results into variables anyway, and to produce idiomatic code you'd later rewrite or factor into functions anyway.
  • Words consuming data off the stack are destructive — in the sense that if you later want to reuse lost results you need an ability to edit & re-run code anyway, e.g. saving / DUP'ing some values in the middle.

So I suspect environments making it easy to go back and edit are superior even for postfix? (And once you have something like Jupyter or Light Table, it's equally easy to edit any part of the code & re-evaluate, and other ergonomic considerations apply.)


Prefix notation has a different interactive benefit — can help you discover available functions and their signatures before you code their arguments.
By tooltips / tab completion, or by explicit help commands.

(In unix command-line, I'll frequently wastefully run cmd1 | cmd2 | sort --help just to read docs before completing the command. Shell pipelines are only postfix wrt. stdin, but prefix wrt. flags, which interestingly feels like a sweet spot from this angle: I did build up input I want to sort before I start typing | sort but might not remember sort's flags.)

$\endgroup$
1
  • $\begingroup$ Welcome! On your points about pushing things in the right order to use later, or data being lost when you consume it, I think both are addressed by storing intermediate results in named variables. $\endgroup$
    – kaya3
    Commented Jul 2, 2023 at 13:59

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .