3
$\begingroup$

Say I'm designing a virtual machine for a bytecode compiler/interpreter, using C as the implementation language. Some kind of “tagged” representation of values is simplest for this language, where every object carries information about what kind of object it is. Most objects are heap-allocated, so their usual representation at runtime is as a pointer.

A common way of doing tagged pointers is to allocate with a certain alignment so that some of the least significant bits don't matter; the tag goes in those bits. However, as I understand it, this is sketchy in C as far as portability is concerned (at least, that's why the Lua developers decided to use a normal union).

Is there a reason not to use offsets from the start of my language's heap instead of “raw“ pointers? Tagging offsets is easy, and I have better control over how the tagging works (e.g. it can be in the high bits and it can be more than a few bits wide).

I feel that heap[p] isn't too much worse than *p; in fact, the intent is clearer in my opinion. Also fewer casts would be necessary.

One potential objection is that I'd have to do my own memory management. But my language already requires garbage collection, so that isn't too bad.


A recent question is related to this one, but it does not really help me, since the question and the existing answer take the use of an offset instead of a pointer for granted. And that answer addresses the specific question about a modification of the offset representation.

I'd like to clarify here that I am not interested in pointer compression or saving space. I really don't care about “wasted” bits in a 64-bit word; I don't plan on significantly limiting the heap size available to programs in my language, so the larger the address space the better. If it turns out to be a problem I can consider compression mechanisms. For now I find it simpler to reserve some bits for tagging and let the rest of the word be an address or offset.

$\endgroup$
7
  • 4
    $\begingroup$ This is discussed at some length on the V8 dev blog: Pointer Compression in V8. $\endgroup$
    – kaya3
    Commented Jun 13 at 17:42
  • 1
    $\begingroup$ Does this answer your question? Compressed pointers, why not "relative" rather than "base" encoding? $\endgroup$ Commented Jun 13 at 20:08
  • 2
    $\begingroup$ @GregHewgill It's closely related, but I'm not asking about compressed pointers and it focuses on an alteration to the basic “offset instead of pointer” scheme. I'm not really interested in trying to save space. An answer to that question might be an answer to this one, but the existing answer addresses primarily the specific concerns with relative offsets. However, the viability of offsets as a strategy is implicitly confirmed by the cited real-world uses, which is good to know. $\endgroup$
    – texdr.aft
    Commented Jun 13 at 20:21
  • 1
    $\begingroup$ Possibly a slight hit to performance. Every memory reference will need an extra add to compute heap+p (in addition to masking off the tag bits of p, but that would apply to tagged raw pointers too). But some machines have addressing modes where the add could come for free. Also, you'll need to load the heap pointer frequently, and probably spend a register in most functions to keep it around. The overall impact may or may not be significant; only profiling can tell you for sure. $\endgroup$ Commented Jun 14 at 5:52
  • 1
    $\begingroup$ I think the "common" implementation is a legacy of implementations from decades ago, where the performance hit was more significant. The heap+offset mechanism is probably more acceptable today. $\endgroup$
    – Barmar
    Commented Jun 15 at 21:54

0

You must log in to answer this question.