9

I'm in my intro classes and trying to understand how a Java compiler works. Most posts said a compiler translates A to B (which could be machine code) to run, while an interpreter 'just' runs the code. How can the machine run code with just an interpreter, without producing or translating the code into machine code? Does it call built-in precompiled code of the functions it needs? If not, how can the machine run it? I learned it only understands bits.

4

5 Answers 5

11

An interpreter is nothing more than a computer program (usually compiled to machine code) that's designed to execute other computer programs.

Like a compiler, an interpreter contains logic to read source code files, parse it (to understand its structure and syntax), and perform semantic analysis.

The difference is that an interpreter does not convert the program instructions to native machine code. Instead, it executes the program itself. You might think of an interpreter as an emulator for a virtual “computer” specially designed to support the interpreted language. This virtual computer will provide:

  • The “memory” in which the interpreted program will store its variables, function call stack, etc. Perhaps this will be a hashtable mapping interpreted-language variable names to objects. Memory can be dynamically allocated and deallocated as needed.
  • A means of implementing variable assignments (updating the aforementioned “memory” structure), function calls, or other control flow.
  • A means of handling any errors/exceptions that occur within the interpreted program.
  • A runtime environment that includes the interpreted language's built-in functions and modules for doing math, I/O, etc.

The core of an interpreter is basically just a giant for loop that iterates over the interpreted program's instructions and executes them one at a time. (See @Eik Eidt's answer for a simple example.) Non-sequential execution (as with goto statements, if...else statements, or function calls) can be implemented by changing the for loop's “current instruction” index.

It's common for interpreted languages to use some kind of intermediate representation that's easier for the intepreter to work with than raw source code. This can be a “tokenized” representation of the source code, an abstract syntax tree, or a bytecode format that may closely resemble actual machine code. This intermediate representation may be an internal implementation detail that exists only in memory, or it may be saved as a “compiled” file on its own (as with Java .class files or Python .pyc files). The interpreter then executes the immediate representation.

There is also the hybrid approach of just-in-time compilation, which compiles the program (or parts thereof) to machine code at runtime and then runs it using the actual computer hardware. This tends to provide higher performance than normal interpretation.

1
  • 4
    "JIT compiles the program to machine code" - or more accurately, parts of the program
    – Bergi
    Commented May 11 at 13:12
35

You can write a simple interpreter.  Let's make a language that "A" means print Hello and "B" means print Space and "C" means print World, then null means end program.

void interpret ( char *input ) {
   for (;;) {
       switch ( *input++ ) {
       case 'A' : printf ( "Hello" ); break;
       case 'B' : printf ( " " ); break;
       case 'C' : printf ( "World" ); break;
       case 'D' : printf ( "\n" ); break;
       case '\0' : return;
       }
   }
}

...

interpret ( "ABCD" );  // prints "Hello World\n"
interpret ( "CBAD" );  // prints "World Hello\n"

There!  That's an interpreter for a simple language.  More complex language, then more complex interpreter.  There's no need for machine code translation here — except that this C program has to be compiled.

The point is that this interpreter, by just being a regular program, can do what is necessary to perform the operations in the input — it has all it needs to do the work of the input language.

2
  • 1
    Maybe even take the program from stdin or argv to drive home the point that the interpreter doesn't have to be recompiled for different programs?
    – Bergi
    Commented May 11 at 11:34
  • 1
    There are interpreters for C, so this mini interpreter you've made here doesn't necessarily even need to be compiled to be run.
    – 8bittree
    Commented May 13 at 17:55
10

Every programming language has semantics. These semantics are what a program means, and it defines what actions should be taken when one "runs" the program.

An interpreter reads in a program, parses it, and then does the actions that the semantics of the programming language says should be done when one "runs" the program.

A compiler reads in a program, parses it, and then instead of doing the actions, it writes out instructions in another language* which, if you "ran" that second program, the actions that should be taken are the same as the actions that should have been taken for the first program. The important distinction is that the compiler doesn't actually do those operations. It allows one to "run" the outputted program to do them later.

There's a bit of wishy-washyness around "run," which is why I put it in quotes. For 99.9% of programing, we will say one "runs" machine code. So if a compiler outputs a program in machine code, we say all that's left to do is "run" it. However, it's turtles all the way down. At the CPU level, "running" machine code actually ends up being another interpreter step, converting it from machine code to microcode, and then "running" that microcode. And at the gate level, "running" the microcode is really just a set of massively parallel operations on the voltages at different transistors in the CPU. And even ignoring that, there's emulators, which interpret machine code into something that can be run on different machines.

I give those examples to point out that yes, the concepts of interpreting and compiling have some funny overlaps, dealing with the question of what is "running." But for the most part, your intuition about what is running is generally an effective one.

*. Or rarely: in the same language but a different form.

5

I don't understand how an interpreter can run the code without producing or translating the code into machine code

By using facilities of the language the interpreter is written in which are compiled to machine code. Erik Eidt's answer gives an example with a basic interpreter written in C, and it runs the interpreted code by calling C functions like printf.

Does it have a built-in precompiled code of the functions it needs, and then it calls them?

Yes... although maybe not in the way that you seem to think. It has the code of the language the interpreter is written in, and it translates the interpreted code on the fly into procedure calls in the interpreter language.

For example, Python has the print function. If you wanted to write an interpreter for Python in, say, Golang you'd have to analyze the details of the print function and then write Go code to implement what Python's print function does, which is a lot more involved than just writing something to STDOUT (for instance it calls str() on its argument meaning you'd have to implement the sematics of that too), although for a toy example you might just call fmt.Println instead of actually implementing Python's print.

If that sounds like a lot of work...well, it is. Writing an interpreter for any real programming language is a large task, which is why you should usually start with something simple like a toy scheme dialect.

2

First of all. Java programs are executed by a runtime environment that creates the equivalent of a virtual machine, it is visible by your OS as a normal process, but what the OS see is the java runtime not the specific java program.

Second. Since when Java was created dozen of different implementations of the runtime environment came out. They more or less follow similar guidelines, but it is not strictly fixed.

With most of the runtime environments (standard way) Java programs go through two steps. 1) Translation into bytecode to generate the .class files and packages. 2) Just in time compilation.

  • Step 1. The code is translated into bytecode. This step technically is a precompilation, but when you execute a build (by maven, gradle or similar) it is called the compilation step, which sometimes creates confusion. The bytecode and the package that is created is not machine code, but a common instruction set that can be executed on any host OS type, provided you have the runtime installed.

  • Step 2. When the execution of the java program is launched the runtime environment calls the Just-in-time compiler, translates the bytecode into machine code and executes it. Some non-standard runtimes use a bytecode interpreter, but if you use the common JDK or the original Sun JVM (now Oracle) there is no interpreter.

Exception. Many runtime environments now include a console for quick tests and syntax checks. They include a framework that will build everything, including the main, around the lines of code you write. Those will be executed on the fly and maybe by an interpreter.

4
  • 1
    The distinction between bytecode and machine code is smaller than one might think. Java bytecode has been run directly on hardware and x86 or ARM machine code are often run in software.
    – 8bittree
    Commented May 14 at 14:01
  • The statement that their is no interpreter in Hotspot (the standard java runtime in the JDK) is not strictly true. Hotspot is not primarily an interpreter, but it does include an interpreter as the first stage of its execution pipeline. The just in time compiler is run as a background task, compiling code that has been interpreted multiple times. Commented May 19 at 0:52
  • The reason for this is that compiling bytecode usually takes longer than interpreting at least a couple of times, so early pure JIT JVMs had horrible startup times compared to interpreters, due to the amount of time spent compiling code that would only be run once at startup, before the application could start its task. They also had issues with pauses for compiling rarely used code, that might only be hit once in an applications runtime. Asynchronous compiling of repeatedly executed code allows the application to start working immediately, whilst minimizing the overhead of the compiler. Commented May 19 at 0:59
  • In the extreme case, for a degenerate purely linear program with no loops, repreated calls, or recursion, Hotspot will behave as an interpreter. Commented May 19 at 1:00

Not the answer you're looking for? Browse other questions tagged or ask your own question.