Say I'm designing a virtual machine for a bytecode compiler/interpreter, using C as the implementation language. Some kind of “tagged” representation of values is simplest for this language, where every object carries information about what kind of object it is. Most objects are heap-allocated, so their usual representation at runtime is as a pointer.
A common way of doing tagged pointers is to allocate with a certain alignment so that some of the least significant bits don't matter; the tag goes in those bits. However, as I understand it, this is sketchy in C as far as portability is concerned (at least, that's why the Lua developers decided to use a normal union).
Is there a reason not to use offsets from the start of my language's heap instead of “raw“ pointers? Tagging offsets is easy, and I have better control over how the tagging works (e.g. it can be in the high bits and it can be more than a few bits wide).
I feel that heap[p]
isn't too much worse than *p
; in fact, the intent is clearer in my opinion. Also fewer casts would be necessary.
One potential objection is that I'd have to do my own memory management. But my language already requires garbage collection, so that isn't too bad.
A recent question is related to this one, but it does not really help me, since the question and the existing answer take the use of an offset instead of a pointer for granted. And that answer addresses the specific question about a modification of the offset representation.
I'd like to clarify here that I am not interested in pointer compression or saving space. I really don't care about “wasted” bits in a 64-bit word; I don't plan on significantly limiting the heap size available to programs in my language, so the larger the address space the better. If it turns out to be a problem I can consider compression mechanisms. For now I find it simpler to reserve some bits for tagging and let the rest of the word be an address or offset.
heap+p
(in addition to masking off the tag bits ofp
, but that would apply to tagged raw pointers too). But some machines have addressing modes where the add could come for free. Also, you'll need to load theheap
pointer frequently, and probably spend a register in most functions to keep it around. The overall impact may or may not be significant; only profiling can tell you for sure. $\endgroup$