Why does the x86 not have an instruction to obtain its instruction pointer?

Question

This has always confused me. Why can you not directly obtain the IP, and instead have to go through some odd assembly hoops such as calling a function whose only purpose is to push its own return address onto the stack?

I'm asking about the historical reason, since this decision was probably made back in the time of the 8086.

Worth noting is that x86-64 does have such an instruction: lea eax,[eip+0] — jpa, Commented May 21, 2022 at 9:12
BTW, there are plenty of other CPUs where you cannot do that directly, either. — dirkt, Commented May 21, 2022 at 11:33
The problem with asking why is that only the engineers that made the decisions really can answer. The rest of us can make guesses: they could not at the time think of any use for it, it would take too many transistors or slow down something, they might be planning for the future with more adress bits or modifiers, they knew of a good workaround, the marketing people did not require it, programmers managed anyway, it might create other problems somewhere, ... — ghellquist, Commented May 21, 2022 at 16:37
@ghellquist Many such engineers are still alive. Engineers can be and have been interviewed. Some engineers are interested in the retro scene. Engineers must not bee seen as 100% opaque and inaccessible godlike figures we can never know anything about. — hippietrail, Commented May 22, 2022 at 1:22
Hardware engineering POV: Why have an instruction (and expend all the resources which that implies: in design, expending an instruction slot, testing in development, QA, production test, etc., all of which, overall, adds some, marginal, increased cost to every chip) for something that A) can be done with existing instructions without too much difficulty and which is something that is very rarely desired? [Yeah, I know it's CISC, but you've got to draw the line somewhere.] — Makyen, Commented May 23, 2022 at 17:53

Raffzahn · Accepted Answer · 2024-02-13 12:23:45Z

As Thorbjørn Ravn Andersen already put it nicely:

What would you need it for?

There is almost no practical (*1) need to obtain the PC address at runtime (*2) - it's a value to be obtained during assembly time, provided by Assembler and/or Linker. A simple

HERE:   LEA  AX,HERE   ;(*3)

will make sure the Assembler and/or Linker puts the actual instruction's address into a register (AX in this case).

Now, if you really want to do the trick, than best do it like it would work with any CPU: Jump one instruction ahead by using a subroutine call and then pop the return address.

       CALL  NEAR PTR NEXT ; Make sure it's not a far call *4
NEXT:  POP   AX

Except, there's a major caveat:

Above is trouble-free only in clean 16-bit code. Different addressing modes may require use of 32-bit registers and more.

Going for tricks like this is a sure way to introduce incompatibilities. It's the old story of programming what you want to do, not how to do it. Letting your tools, compiler/assembler/linker, do the 'dirty' work ensures it gets performed the best possible way.

The PC is never a 'normal' register but is tied to the basic mechanics of a processor. In fact, it brings a lot of advantages (*4) having it completely separated and not readable, as on 8086 and others. It offers the most simple way to separate operational housekeeping (like (pre-) fetching) from logical operation. The only use case for storing it is in the case of a subroutine branch. This can be handled keeping a shadow copy of the next address to be executed.

Architectures that allow the use of a PC in addressing may need to hold a second shadow copy with the instruction's address, complicating it. Architectures that include the PC in the register set, or use one of their GP registers as PC, need to take care about several constraints (*5). One visible sign is that some RISC implementations do need to use PC relative addressing with a constant offset from the actual location. But there is more.

Long story short: Better not care for the PC at all - beside jumping that is.

*1 - The "almost" case is dynamically created code - but even then it would be more appropriate to improve the generator.

*2 - And no, the standard use of BALR Rx,0 by /370 modules to get a local reference for jumps and constants is an oddity due to there being neither an absolute nor a PC relative addressing, nor the ability to load immediate word (address) values.

*3 - And yes, a MOV could be used as well in nearly all (simple) cases. I still prefer the LEA as it allows even more weird address generations :))

*4 - It still may run into trouble depending on addressing modes and memory model.

*5 - A complex issue in CPUs with a certain level of asynchronous operation, like having prefetch or speculative operation.

Comments are not for extended discussion; this conversation has been moved to chat. — Chenmunka, Commented May 24, 2022 at 6:22
Practically every modern operating system requires such an instruction to implement shared libraries and position-independent code, to the extent of CPU manufacturers heavily optimizing the call/pop instruction sequence, which tended to be very slow as it confuses the return stack cache. Also, one of the major improvements of amd64 over x86 was the addition of rip-relative addressing. — Remember Monica, Commented Jul 18, 2022 at 21:10
@RememberMonica Position Independent code does not need to access IP, it only needs IP relative (or entry relative) addressing. Likewise do shared libraries only need it (if at all) during initialization, so again no need to have a dedicated instruction. Adding one would be ISA bloat. — Raffzahn, Commented Jul 18, 2022 at 23:54
*5 doesn't seem to be used, neither now, nor through the edit history. — tevemadar, Commented Feb 13 at 11:18

Toby Speight · Accepted Answer · 2022-05-24 09:47:09Z

OP specifically clarifies interest in a historical reason. Intel would have to give their exact reasoning but the following points are worth noting.

Intel's 8086 and 8088 were outgrowths of their earlier 8008 and 4004 microprocessors - these architectures all had an address space that required more bits than their 16, 8 or 4-bit data width.

On the other hand, minicomputers of the day, including their microcomputer outgrowths, tended to have an orthogonal approach to registers in which the Program Counter and Stack Pointer were numbered in with the general purpose registers and could be accessed or used in a variety of addressing modes (cf. PDP-11 vs LSI-11 and TI990 vs TMS9900 - both originally true 16-bit architectures addressing 2¹⁶ bytes).

The ability to access the Program Counter was very useful and was a common idiom in accessing parameters and local variables stored with the code, and in implementing Position Independent Code or in contexts relating to linking and/or shifting and/or overlaying code blocks (and there were also various useful use-cases involving self-modifying code and dynamically generated/updated code).

One of the big issues with the PDP-11 and TI990 16-bit architectures, as well as earlier 8-bit and 16-bit microprocessors generally, was the inability to index more than 2¹⁶ units of memory. The PDP-11 family introduced models with separate Instruction and Data address spaces, while a more general approach introduced segments allowed for separate code and stack segments and multiple data segments - with its segment registers the 8086 was able to address a 2²⁰ physical memory. Segmentation also introduced the ability to provide different Read/Write/Execute permissions for different segments.

This, and the increasing use of recursion (which requires a stack), made it inappropriate to store code and data/parameters/locals together in adjacent addresses and means that addresses can't be interpreted without their segment address. That put an end to much of the need for and utility of direct access to the the Instruction Pointer.

This is a great answer to the question, with loads of supporting detail. Welcome to RCSE! — Mark Williams, Commented May 21, 2022 at 10:11
Somewhat misleading arguing. 68k managed to use up to 2^32 bytes without segmentation and also had pc-relative addressing, including the ability to load address from PC to GPR. — lvd, Commented May 22, 2022 at 8:17
@lvd The 68000 is a couple of generations (a decade and a half) later and one of the first 32-bit microchips (alhough with a 16-bit data bus much as the 8088 is a 16-bit machine like the 8086 but with an 8-bit data bus). So it didn't need segmentation to address 2^32 bytes. Of course, paging is another approach to mapping logical address space into a potentially larger physical address and is easier for compilers to manage (so segmentation is phased out in x64). — David M W Powers, Commented May 22, 2022 at 11:30
@supercat Segmentation means that that the same 'address' points to different locations and thus pointers are more complex to manage; conversely they also introduce the possibility of two different addresses refering to the same location (aliasing). Although in fact, the main reason for x64 dropping segmentation seems to have been that OS's didn't use it and so the cost wasn't worth it. As a language developer I'd rather we still had segments irrespective of the matching to a larger address space (and being able to manage paging helps manage expandable arrays more efficiently). — David M W Powers, Commented May 25, 2022 at 12:49

Toby Speight · Accepted Answer · 2022-05-25 06:19:20Z

16

So long as a stack exists, the IP address may easily be obtained via the byte sequence "E8 00 00 5B" [CALL $+3 : POP BX] because near calls use PC-relative addressing. On the other hand, the normal state of affairs for position-independent code on the 8086 is to be located at a fixed address within an arbitrary segment, which allows zero-effort relocation on any 16-byte boundary.

Because such methods are available (in particular, the PC-relative addressing that enables this within a code block), there is less need to provide a specific instruction to access the instruction pointer.

edited May 25, 2022 at 6:19

Toby Speight

1,6761 gold badge14 silver badges31 bronze badges

answered May 20, 2022 at 15:07

supercat

37.6k3 gold badges67 silver badges167 bronze badges

1

Potential catch-22 in some scenarios: with PIC, you may wish to have your stack set to a fixed offset from your code. Which requires knowing the IP to calculate the load address to set SS appropriately...
– TLW
Commented May 21, 2022 at 20:36
2

@lvd Never say Never. The procedureless way is quite proper, and the most efficient way of doing it in certain usecases - e.g. code in *ROM where every byte counted (like on a computer with 2K RAM, 2K ROM and 2K EPROM and no disk, as I had in the late 70s; mid 70s, my machines had <1K).
– David M W Powers
Commented May 22, 2022 at 12:00
1

@supercat: what's the efficiency you're talking about? Not a single dynamic linked list could be made on early 68k macs without double dereferencing addresses through "master pointers" -- provided you've got enough master pointers, which is not always the case! Every time one works with already dereferenced pointers and does function calls, there's a hazard that function call might result in heap 'defragmentation'. All that stuff made both 68k macos programming more cumbersome and the resulting programs slower -- is that an improved efficiency indeed?
– lvd
Commented May 23, 2022 at 7:06
2

This answer makes no attempt to address the question "why is there no direct way to access the instruction pointer?". Whilst it might be interesting to show an indirect way to do it, that is an answer to an entirely different question.
– Toby Speight
Commented May 24, 2022 at 9:49
1

@TobySpeight: On the Z80, the only call instructions use a fixed target address (either following the opcode or embedded within it), which means one needs to know of a fixed address where one can find or place code to inspect the stack, but on the 8086 one can place the stack-inspection code in the same blob as the call instruction without having to know where it is. The ability to do that means that direct instruction to access the PC has far less value than it would on e.g. the Z80.
– supercat
Commented May 24, 2022 at 14:55

| Show 10 more comments

Justme · Accepted Answer · 2022-05-21 10:57:25Z

14

On 8086 the instruction pointer is not a general purpose register you can freely access for reading. On earlier 808x models this was also the case, even though program counter was directly used to fetch instructions without a prefetch queue, and it was settable via PCHL instruction. Because the CPUs supported natively stack, jumping, and subroutines, the programming model how to write ordinary programs just did not need reading of IP, so opcodes and their parameters could be used for other more useful things. And it is still possible to read the position where the CPU is currently executing indirectly if there is need. At a quick glance, many other CPUs from approximately same era (Z80, 6500, 6800) also don't have an opcode to read PC/IP, likely from the same reason.

The CPU does not directly use the instruction pointer for execution, as the executed instructions are fetched from prefetch queue, and the queue is filled from memory.

The instruction pointer (IP) does not reside on the Execution Unit (EU) side, but on the Bus Interface Unit (BIU) side, with the segment registers.

So just like there is no instruction to directly set/store IP, because a jump or call must clear prefetch queue to make sure instructions are fetched for execution from correct address, there is also no instruction to get/load IP, because it likely won't point to the currently executed instruction.

So, whenever the actual value of IP is needed, such as when CALLing a subroutine to push correct value of IP to stack, the value is adjusted as needed and then stored to memory. So there is some logic to keep track of how the IP should be modified when the value is needed.

But internally, that's how the CPU works according to the user manual, with the IP pointing to the address of memory to be next fetched into queue.

edited May 21, 2022 at 10:57

answered May 20, 2022 at 14:47

Justme

34.4k1 gold badge79 silver badges157 bronze badges

1

"[the IP] likely won't point to the currently executed instruction" - The IP associated with the currently executed instruction must exist somewhere, else how could a relative branch work?
– Wayne Conrad
Commented May 20, 2022 at 17:00
4

In order for the CPU to be able to correctly process a CALL or INT instruction, it must know the address of the following instruction, even if instructions after that have been fetched and will need to be discarded. By my understanding, the 8086 maintains a Program Counter within the bus interface, and an Instruction Pointer within the execution unit. CALL and INT instructions save the Instruction Pointer; control-flow instructions load both the PC and IP with the new address.
– supercat
Commented May 20, 2022 at 17:01
Of course there needs to be some logic how it works. However, the 8086 user manual says the instruction pointer is in the BIU, and it also says the IP points to the next instruction to be fetched by the BIU (emphasis from manual). IP is adjusted to point to next instruction to be executed if IP needs to be saved on stack.
– Justme
Commented May 20, 2022 at 17:27
Forgot to mention IP will be adjusted by jumps too. And the prefetch queue is the reason why self-modifying code must ensure that the modified code is not already fetched into queue before modifying the memory. And why "POP CS" did exist but was made undocumented.
– Justme
Commented May 20, 2022 at 17:43
1

@supercat: The 8086's Instruction Pointer points to the next instruction to be fetched. Nothing in the 8086 holds the "real" PC value. When the 8086 needs to know the address of the next instruction to execute (for subroutine return or relative branch), it uses the CORR micro-instruction. This micro-instruction corrects the IP value by subtracting the length of the prefetch queue. This is done using the address adder in the BIU, a separate adder from the ALU, along with the constant ROM. Source: I reverse-engineered the complete die.
– Ken Shirriff
Commented Apr 11 at 17:37

| Show 12 more comments

Omar and Lorraine · Accepted Answer · 2022-05-26 12:16:28Z

There's a highly-upvoted comment, also requoted in this question's accepted answer, saying What would you need it for? For what it's worth, here is a real, practical, albeit reasonably obscure example.

Once upon a time I wrote a C interpreter. One of its goals was to allow interoperation between interpreted and previously compiled code. It contained its own dynamic linker, so that it could read in object and library files, and call functions in them. (This is the same sort of thing that dlopen does today.)

Besides interpreted code making calls to compiled functions, it was also possible for compiled code to call back to interpreted code. Without going into all the gory details, this meant that a pointer to the interpreter's data structure describing an interpreted function had to also be usable as an actual function pointer. This data structure therefore began (that is, at offset 0) with a little data block containing trampoline code which was contrived to fire up the interpreter on the just-called function. And the very first thing the trampoline code had to do was, naturally enough, fetch its own PC, because that value was actually the pointer to the data structure describing the function to be interpreted. (Needless to say this was long before execute protection on data pages, and stuff like that.)

The first processor I wrote this for was the PDP-11, where I found myself able to do something straightforward like mov pc,r0. Not long after, I ported it to the VAX, which trapped on me when I tried to access my own PC in such a crude and obvious way, but after some digging around I was able to achieve the desired effect via a mild circumlocution such as lea 0@pc or something like that. I was certainly aware that what I was doing was dicey and difficult: the PC is not a general-purpose register, for all the sorts of reasons described in the other answers here. So I wasn't surprised I needed a circumlocution, and I was pleased to find one that worked.

(Later on I managed to port the interpreter, with some new trampoline code, to an 80286 or 80386 under MS-DOS, but I don't remember which instructions I used or how clever I had to be. Much later I somehow got it working on the same x86-based Mac where I'm typing this today. The gory, handcrafted trampoline code is gone, replaced by a magic libffi closure.)

So, anyway, this is an example of why someone might legitimately need — or once have needed — to directly access the program counter.

Well, nice to know why it would be useful to have such an instruction, but it still does not answer the question why there is no x86 opcode for directly getting the current value of IP, and even without such an instruction, it is easy to indirectly fetch it. Regardless of the CPU architecture, you have to work with the registers and instructions you happen to have. — Justme, Commented May 21, 2022 at 7:56
[cont] If (as is often the case: no reflection on you or the original OP), it's an "X-Y" problem (an OP wants to do "something", and thinks knowing the PC will help), then people may be able to point the OP in better ways of solving the core problem. In your particular case, depending when you asked, you might have been told of LEA 0@PC, have been directed to libffi, or – indeed – your genuine need might have sparked the development of something like libffi in the first place! — TripeHound, Commented May 21, 2022 at 8:06
Second, I understand why you are proud of your clever hack even though I do not fully understand which original problem you solved (avoiding having to run the compiler?). I noted, however, that you found the required handcrafted (probably meaning raw assembly) trampoline code difficult to maintain and replaced it. — Thorbjørn Ravn Andersen, Commented May 21, 2022 at 10:51
Third, "fell into disuse" sounds like it wasn't used by many others than you. How come? — Thorbjørn Ravn Andersen, Commented May 21, 2022 at 10:54
Fourth, I agree with Raffzahn that this does not answer the original question at all (which may cause it to be closed and deleted) and therefore is probably better placed somewhere else. Perhaps on your personal blog with a lot more details? — Thorbjørn Ravn Andersen, Commented May 21, 2022 at 11:16

Timothy Baldwin · Accepted Answer · 2024-02-17 23:13:26Z

2

8086 uses a segmented memory model with the address instructions are fetched from calculated by CS * 16 + IP. For position independent code one would fix IP at link time and choose CS at runtime.

It was not until the 386 and operating systems that used a 32-bit flat memory model that reading IP was useful for position independent code.

answered Feb 17 at 23:13

Timothy Baldwin

1291 bronze badge

Add a comment |

hotpaw2 · Accepted Answer · 2022-05-28 02:03:06Z

0

A readable PC may require extra hardware resources, perhaps a dedicated set of unidirectional wires from the PC to some write Dest reg data path. Fetching instructions and pushing the PC only requires wires from the PC to the memory access unit.

So why add wires to the chip area budget unless there’s profitable payback (performance, etc.)

answered May 28, 2022 at 2:03

hotpaw2

8,2871 gold badge19 silver badges46 bronze badges

The "wires" to put the PC register on the ALU bus already exist even on the 8086, see e.g. the patent.
– dirkt
Commented Feb 18 at 7:34

Add a comment |

R.. GitHub STOP HELPING ICE · Accepted Answer · 2022-05-22 15:23:21Z

-1

It does. call 1f; 1: is "push ip".

answered May 22, 2022 at 15:23

R.. GitHub STOP HELPING ICE

5813 silver badges14 bronze badges

4

This was already mentioned in supercat’s answer.
– Stephen Kitt
Commented May 22, 2022 at 16:58
1

@wizzwizz4: Calling to a label immediately after the call instruction does not change the flow of execution but pushes the address immediately after the call instruction (the new value of ip) onto the stack (as the "return address").
– R.. GitHub STOP HELPING ICE
Commented May 22, 2022 at 17:19
@wizzwizz4: Nothing to do with linker. It's just how you write a call with zero displacement in asm mnemonics.
– R.. GitHub STOP HELPING ICE
Commented May 22, 2022 at 17:21

Add a comment |

Stack Exchange Network

Why does the x86 not have an instruction to obtain its instruction pointer?

8 Answers 8

Further Reading:

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
instruction-set
design-choices
x86
.

Hot Network Questions

Why does the x86 not have an instruction to obtain its instruction pointer?

8 Answers 8

Further Reading:

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged instruction-setdesign-choicesx86.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
instruction-set
design-choices
x86
.