Intermediate code by virtual machines

Question

I was reading about instruction set and I came across following line from WIKI

Some virtual machines that support bytecode as their ISA such as Smalltalk, the Java virtual machine, and Microsoft's Common Language Runtime, implement this by translating the bytecode for commonly used code paths into native machine code.

Please correct me on my overall understanding on "instruction set" and virtual machines handling of the same:

Basically processors have instructions or instruction sets loaded into their memory which are used for processing. Now processors receive binary input which is nothing but the “binary” representation of those instructions set. Right?
Instruction set can also be called as “machine code” or “machine language”. Right?
JVM accept Java bytecode and produce the binary form of “instruction set” or “machine code” or “machine language”. So, in this way I can say that bytecode is the machine language of JVM because JVM understand the bytecode or takes bytecode as input and produces binary form of “machine code” for the Operating System and then OS will further convert those “machine code” into “machine code” for the processor. Right?
- Bolded portion in my above question is among main thing I want to understand. Because we don't have JVM's for specific processors, we have JVM's for specific OS, so my understanding is that JVM cannot produce "machine code" for the processor and will produce the "machine code" for the OS and then finally OS will produce the "machine code" to be executed by the processor.

Please confirm my understanding and if it is wrong then please provide details and/or reference on why it is incorrect and what is the corrected concept.

You say there is no JVM for specific processors, but I see 32bits, 64bits and SPARC JVMs on the download page : java.com/en/download/manual.jsp — user2313067, Commented Feb 5, 2017 at 8:32
@user2313067 I am kind of surprised with your comment because bitness and processor are different things, I know there are different JVMs for 32 and 64 bit. If you can prove me that there are different JVMs for Intel, AMD, Motorola, Qualcomm etc. processors then I will humbly accept that statements I made in my question were wrong. — pjj, Commented Feb 6, 2017 at 17:34
But AMD and Intel both make mostly x86 (32bits) and x86_64 (64bits) processors. Those use the same instruction set so they process the same machine code. I think qualcomm makes mostly arm processors, so you'd need an arm JVM, but I'm not certain. — user2313067, Commented Feb 6, 2017 at 17:47
With all due respect sir, it looks like that you don't have knowledge on Java, you said - "you'd need an arm JVM", this is not at all right because for this to be true, the most important Java feature has to be broken i.e. Java being architecture neutral and portable. — pjj, Commented Feb 6, 2017 at 23:14
Well the Java bytecode remains architecture neutral and portable. You only need a JVM for the current architecture. x86 and x86_64 are different architectures and SPARC is completely different. Yet you have a JVM for SPARC Solaris and one for x86_64 Solaris. The point of Java is that the bytecode generated is independent of the system on which it will run. You still need a JVM for the OS and architecture it runs on. — user2313067, Commented Feb 7, 2017 at 7:43

John Dallman · Accepted Answer · 2022-08-09 18:53:34Z

You have been confused by two separate, related but related concepts. The quotation you're working off mixes them up, which does not help. Here's an attempt at a clearer explanation:

First, consider a microprocessor, such as an x86 or ARM chip. That can execute instructions in the ISA (Instruction Set Architecture) that is built into it. It assumes that the code you try to run on it is in the right ISA; if you feed it code for a different ISA, nothing useful will happen. There are various caveats and qualification to this, described below, but this is how stuff actually runs, at the bottom level.

Modern ISAs usually allow for "virtualisation", in which software and hardware co-operate to create additional "virtual computers" of the same type as the real computer they're running on. Thats the kind of "Virtual Machine" that is created by software like VMWare, Parallels, or Hyper-V. Wikipedia calls this a "system virtual machine," because it creates virtual versions of an entire computer.

The other kind of "virtual machine" refers to an abstract model of a computer. This has an ISA of its own, but that ISA is designed for some specific purpose. Java bytecode and .NET bytecode are the most widely used examples. There is no hardware that can run this ISA directly. Instead, software, like a JVM or the .NET Runtime is used to run it. This generally translates the special-purpose ISA into the actual ISA of the computer that's running the software. Wikipedia calls this a "process virtual machine, because it normally runs as a single process under a general-purpose operating system.

Now we have enough background to look at your statements:

Basically processors have instructions or instruction sets loaded into their memory which are used for processing. Now processors receive binary input which is nothing but the “binary” representation of those instructions set. Right?

Not quite. The instruction sets are built into the processors. The programs a processor is to run are loaded into memory, and the processor fetches the instructions that make up the program from memory into the processor. This happens in small groups of instructions, rather than the whole program being sucked in (unless it's very small).

Instruction set can also be called as “machine code” or “machine language”. Right?

The instruction set is the total of all the kinds of instructions the processor can run, together with they ways they are represented, their limitations, and so on. "Machine code" consists of instructions that follow the rules of the ISA, set up ready for the processor to run.

JVM accept Java bytecode and produce the binary form of “instruction set” or “machine code” or “machine language”. So, in this way I can say that bytecode is the machine language of JVM because JVM understand the bytecode or takes bytecode as input

Correct up to here.

and produces binary form of “machine code” for the Operating System and then OS will further convert those “machine code” into “machine code” for the processor. Right?

No. Conventional operating systems require their programs to be built for the right combination of OS and ISA, and don't do any translation before running a program. (I know of one experimental OS that does do translation, but it has never been a product, or used outside its development project.)

A process virtual machine, such as a JVM, is a conventional program, built for a particular combination of operating system and ISA. It translates Java bytecode into the host machine's ISA, but it has to conform to the host operating system's conventions for machine code while it is doing that. It also has to translate input and output requests made by the Java bytecode into calls to the host operating system.

An example would made this clearer:

A SPARC Solaris JVM will translate Java bytecode into SPARC instructions, and translate Java i/o into Solaris i/o.

An x86-64 Solaris JVM will translate Java bytecode into x86-64 instructions, and translate Java i/o into Solaris i/o. Compared to the SPARC Solaris JVM, the machine code translation is very different, but the i/o translation is very similar.

An x86-64 Windows JVM will translate Java bytecode into x86-64 instructions, and translate Java i/o into Windows i/o. Compared to the x86-64 Solaris JVM, the machine code translation is quite similar (although some conventions are different) but the i/o translation is very different.

Because we don't have JVM's for specific processors, we have JVM's for specific OS, so my understanding is that JVM cannot produce "machine code" for the processor and will produce the "machine code" for the OS and then finally OS will produce the "machine code" to be executed by the processor.

Completely wrong, I'm afraid. See above.

Caveats and qualifications

Many processors are capable of execution more that one instruction set, by being switched into different "modes." Modern x86 processors have at least three modes: 16-bit (rarely used nowadays), 32-bit and 64-bit. ARM processors have a variable number of modes, chosen from ARM 32, Thumb 32 and ARM64.

Programming tools for most operating systems put information at the start of programs to label them with the ISA they require. That enables the operating system to check if a program is suitable for running before trying to do so.

Some processors have "microcode", which is software that tells the processor how to run specific instructions. This is not in main memory, but in special storage inside the processor.

Stack Exchange Network

Intermediate code by virtual machines

1 Answer 1

Caveats and qualifications

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
virtual-machine
cpu
operating-systems
computer-architecture
jvm
.

Hot Network Questions

Intermediate code by virtual machines

1 Answer 1

Caveats and qualifications

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged virtual-machinecpuoperating-systemscomputer-architecturejvm.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
virtual-machine
cpu
operating-systems
computer-architecture
jvm
.