6
$\begingroup$

Say I have the following construct in an interpreted language that uses an AST tree walker:

for (int i in range(1, 1000)) {
  // do stuff
  if (i % 6 == 0) { continue }
  // do more stuff
  if (i == 872) { break }
}

How do I best tell my interpreter to jump to the next loop iteration or exit the loop altogether? Is there some sort of common pattern used or some other best practice?

Note that this could also apply to a while loop.

$\endgroup$
4
  • 2
    $\begingroup$ Do you mean a tree-walking interpreter, or a bytecode interpreter? The answer for the latter is much closer to how a compiler would have to do it, since bytecode interpreters tend to convert control-flow instructions into jumps at the compilation stage. $\endgroup$
    – kaya3
    Commented May 16, 2023 at 22:21
  • $\begingroup$ @kaya3 I mean a tree walking interpreter - I'll edit that in $\endgroup$
    – lyxal
    Commented May 16, 2023 at 22:22
  • $\begingroup$ @kaya3 yeah I figured a byte code interpreter would be able to just jump to a different memory address using something analogous to a GOTO statement $\endgroup$
    – lyxal
    Commented May 16, 2023 at 22:24
  • $\begingroup$ @lyxal Yeah the JVM implements loops exactly that way, using gotos, iirc (it's a proper instruction in bytecode, now I'm just waiting for Java itself to support gotos) $\endgroup$
    – user
    Commented May 29, 2023 at 14:52

3 Answers 3

12
$\begingroup$

Typically, you need to distinguish between what it means for a statement or expression to "complete normally" or "complete abruptly" (borrowing terminology from the Java Language Specification). An expression which evalutes to a value "completes normally", as does a statement which executes in a normal sequence. On the other hand, a statement or expression which has some effect on the surrounding control-flow "completes abruptly".

A statement or expression could complete abruptly if it:

  • Returns a value from the enclosing function.
  • Throws an exception.
  • Breaks from or continues an enclosing loop.

So generally speaking, these are all handled the same way. A typical solution is to represent the result of a statement or expression as a discriminated union of these possibilities, for example in Rust you might write:

enum EvalResult {
    Normal(Value),
    Return(Value),
    Throw(Value),
    Break,
    Continue,
}

(If you have labelled breaks and continues, or you allow constructs like let foo = loop { break 5; }, then your enum will need to account for those, too.)

Your "eval" function, which evaluates an AST node, returns a result like this. Then when evaluating a loop, you would check for these results and handle them appropriately:

match ast_node {
    // ...
    WhileLoop(cond, body) => {
        loop {
            let cond_result = eval(cond);
            match cond_result {
                Normal(v) => if !v.is_truthy() { break; },
                _ => return cond_result,
            }
            
            let body_result = eval(body);
            match body_result {
                Normal(_) | Continue => continue,
                Break => break,
                Return(_) | Throw(_) => return body_result,
            }
        }
        return Normal(VOID_UNIT);
    },
    // ...
}

Here I'm assuming is_truthy() converts a Value to a native Boolean in the host language, and VOID_UNIT is the value resulting from a statement which doesn't produce a "proper" value.

Note how we return cond_result or return body_result directly when it is necessary to propagate a control-flow effect that this loop isn't supposed to handle itself, for example if the loop body contains a return or throw statement. The same propagation will have to be done elsewhere, e.g. when you add two numbers together, you'll have to propagate control-flow effects from left_result and right_result, in order for those control-flow effects to end up being handled by the eval function for the correct AST node.

$\endgroup$
11
$\begingroup$

Throw an exception

For a simple tree-walking interpreter in a language with catchable exceptions, both break and continue statements can be implemented as throwing a specific exception, while the loop sets up a try-catch to detect and handle them.

Consider this pseudocode implementation:

interpret_while(condition, body):
    try {
        while (evaluate(condition) is True):
            try {
                evaluate(body)
            } catch (Continue) {
                // do nothing, just go to next iteration
            }
    } catch (Break) {
       // also do nothing, have already left the loop
    }

interpret_continue():
    raise Continue
interpret_break():
    raise Break

We have two exception types and two try-catches, one inside the loop for continue and one outside for break. If there are nested loops, the innermost one will automatically catch the exception and produce the right semantics. The exception-handling system already innately deals with the necessary stack unwinding, exiting each layer of recursion until you get back to the right place.

This can also allow for labelled breaks: include the label in the exception, and test and re-throw if it doesn't match until it reaches the appropriate surrounding loop.

This is not going to be the most efficient or performant approach, but it's very straightforward to get moving with and leverages the functionality of the host system. A similar approach can be used to implement non-local returns or exception handling within the interpreted language, and those will fit in comfortably with one another.

$\endgroup$
0
$\begingroup$

What do you do on a regular iteration, how do, after the end of the last instruction, get back to the first, in the block of the for loop (actually, before that, to execute the increment and check the condition or progress the iterator, but not repeat the initialization)? Typically, wouldn't you insert (or rather, internally store the reference/address of that AST node on your internal stack or for the current loop) a jump target label of some sort, and if encountering the continue, you jump to that (or, if modifying the AST, replace the continue with a jump to this hard address/target)? Similar with break, jumps to the next spot after the end of the block that belongs to for (if it's a tree, unwinding/upwards till getting back to for, and then continuing with next).

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .