21
$\begingroup$

I want to remove explicit addressing from my type system.

In low level languages like C or Rust, you always have a choice whether to store some data by value or by reference. There are some advantages to each:

By value By reference
Space can be reserved at compile time, no memory allocations needed Fast to pass regardless of size
Better cache consistency Allows recursive, potentially infinitely sized, data structures
Always needs to be at least as large as the largest potential size Allows dynamic size
...many more subtle differences

There is an issue with this: Determining which is more efficient depends on a dozen factors and is hard for a programmer to determine. Even if one option is better for the moment it could change, but it's hard to make changes to the type of things.

There is also a matter of philosophy. I view the type system as describing the relationships between data and the set of valid states. How an object is stored should not be part of that; it's just noise.

Many object oriented languages solve this by only ever storing pointers. This works, but is not always the most efficient. It leads to many very small memory allocations, which is slow and hurts cache consistency. I don't need inheritance-based polymorphism. One exception is C# which allows declaring at the type, instead of at the declaration level whether the type should be stored by value or by reference, which is still not ideal.

So here is my question: If I have a type declaration without any explicit indicators for how memory should be managed, how do I determine if it is more efficient to store a specific attribute by value or by reference? Ideally it would work for any kind of place data is stored or transferred, like a local variable, function argument, field of a different type etc.

If no exact answers exist, at least some heuristics would be nice.

$\endgroup$
12
  • 10
    $\begingroup$ If there were a simple answer to this question, I suppose you would see it implemented in a number of existing programming languages. The fact that this is apparently not the case is a strong hint hat this problem is hard and potentially unsolvable. As with any code optimization problem, you would need to have an optimization goal - do you want faster or more compact code or data memory layout, is code actually part of the hot path, etc.? $\endgroup$ Commented Jan 3 at 9:30
  • 3
    $\begingroup$ @Moonchild I wish that site would actually explain what that meant $\endgroup$
    – mousetail
    Commented Jan 3 at 10:20
  • 2
    $\begingroup$ I suppose it means that, sometimes it is more efficient to box, and sometimes not. And it does not just depend on the program you're compiling, it also depends on the program's input. So, even for a fixed program, there is no definite perfect answer to your question. (I still think it's an interesting question to look at how existing compilers do it) $\endgroup$ Commented Jan 3 at 10:39
  • 9
    $\begingroup$ The title of the question is about passing by value or by reference (a matter of semantics), but the question seems to really be about type layouts and memory allocation (a matter of implementation). It's worth mentioning that storing copies of objects inline, instead of as pointers, is only allowed as an optimisation when the copy cannot be mutated (or, will not be), otherwise mutations will not be correctly reflected in the original; so that is the main constraint you need to consider. $\endgroup$
    – kaya3
    Commented Jan 3 at 15:25
  • 2
    $\begingroup$ @Sebastian High level garbage collected languages usually have only references. Even python has only references. And yes you can change the value of 1 to 2 but it requires ctypes abuse $\endgroup$
    – mousetail
    Commented Jan 11 at 9:39

2 Answers 2

11
$\begingroup$

I am not aware of any such compile-time algorithm, and I doubt an optimal one may exist.

Calling convention

One consideration that is missing from your table is calling convention. When compiling to native code, an important optimization is to pass via registers, rather than passing via the stack.

Passing via registers avoids writing to and then reading from the stack, whenever possible. It is actually so critical that GCC and LLVM will pass small struct in multiple registers when possible.

If a function only accepts arguments by reference, and is not inlined, then an argument sitting in a register must be written to the stack so its address can be taken, and then will be read from the stack within the function. And that's how one bleeds performance.

Coalesced Boxing

As you mentioned, in your table, boxing or not affects quite a few parameters. Worse, though, your table is incomplete!

When boxing is performed for recursion reasons, it's also possible to use unrolling techniques (ie, box N items at once) or just build a single stack of items, rather than boxing them all individually. There are, therefore, many, many, choices.

Optimization

It may be useful to reformulate the problem, instead, as: given that all objects are boxed by default -- a safe transformation -- which can we coalesce or fully unbox?

It is fairly well-known amongst optimizer developers that there's no perfect ordering of optimizations. The current optimization pipelines have been achieved by careful tuning, driven by heuristics and benchmarks, and there are Super-Optimizers whose only added-value is to juggle optimization pipelines and compare the performance of the resulting binaries.

As an illustration, it will likely be more optimal to box a large array before passing it down (as a single pointer, or a pointer/size pair) several functions, rather than copying the array at each call. However, should the functions be inlined, the copies can be trivially eliminated, and boxing is now a pessimization!

Hence, whether to unbox the array or not depends on whether the function(s) it is passed to is(are) inlined or not...

The fact that JS engines and the JVM leave so many smallish types boxed, however, leads me to think that the problem is highly non-trivial.

Case by case

Attempting to decide on a type basis seems untractable. As mentioned, even a somewhat large-ish array may be better off unboxed -- if it's never copied and fits on the stack.

The special case is values that are smaller than a pointer, such as booleans, most integrals & floating points, etc... Unless reference semantics are necessary, those are better off unboxed at all times.

Polymorphism to the rescue?

Instead of, or in complement to, attempting to decide for each value whether to box it or not, polymorphism may be an interesting approach.

Instead of generating a single function which takes either a boxed or unboxed argument, you could instead generate a function which accepts both, so that the caller may pass whichever it has in hands.

Similarly, it may be possible for a field of a type to accept either boxed or unboxed value. This may be more complicated -- though JS engines typically do so through the concept of "shape" -- but would once again allow NOT to pick one at a compile-time.

Whether this would lead to relatively optimal code, however, is quite unclear to me.

$\endgroup$
1
  • $\begingroup$ Passing by pointer in default then gradually add specific conditions when passing by value is safe and beneficial is a good idea $\endgroup$
    – mousetail
    Commented Jan 3 at 17:50
3
$\begingroup$

Register size

Very small, zeroinit types, that typically fits the machine register size, are "passed" and "returned" via registers, that act as effectively reserved space for arguments and return on specific target triplets.

These basic types have value semantics because copying them into/from these reserved registers have a very crucial performance impact.

Immutable types

If your language defines const or immutable semantics, passing stack references of immutable instances increases the performance substantially, because you don't need to either copy the object to simulate value semantics, and also can further cache read-only data obtained from these immutable instances.

const/constexp/comptime semantics also enable some more optimizations that are only possible with restricted arguments.

Escape analysis

I want to remove explicit addressing from my type system.

If your compiler can determine that one leaf function creates and returns, or otherwise use an object that never "escapes" -- that is, the only references of the object created or passed only exists in local thread stack references, and never are stored in any heap allocated object -- your compiler can automatically mark this argument/return a noescape or ref variation of argument/return.

This, in turn, enables further optimizations in callers of these functions, "bubbling up" the escape analysis results.

This enables that the calling code decides if an ref.in argument is allocated on heap or stack, and further enables disabling garbage collection on ref.out heap allocated objects temporally.

ref.out instances reduce garbage collection pressure as they only exist and are managed in stack references, and would need be "converted" in normal, heap allocated objects, if they "escape" this restriction.

$\endgroup$
3
  • 1
    $\begingroup$ Are you talking about a JIT? Calling conventions can't normally be optimization-dependent in ahead-of-time compiled code. $\endgroup$ Commented Jan 4 at 12:02
  • $\begingroup$ @PeterCordes If a function is limited in scope to one module or file, the compiler can do whatever it needs with in that scope; there is no need for a function with the normal ABI to exist. Even if it escapes, a compiler could create multiple functions, some only for us in that module. GNAT emits .ali files alongside .o files, storing data not provided by the object file; a compiler that did that could mutate calling conventions at will, as long as it was noted in the data file used by to compile and link any users. $\endgroup$
    – prosfilaes
    Commented Jan 4 at 22:06
  • 1
    $\begingroup$ @prosfilaes: True. GCC will do that to some degree, making foo.constprop.0 clones of functions for callers within the compilation unit (or with LTO), such as godbolt.org/z/co4nq7TT8 . And Clang/LLVM will rewrite a function for its only caller if you make it static. GCC and LLVM aren't as aggressive about optimizing calling conventions as they could be, probably since it's a hard problem. But they will also do stuff like taking advantage of the fact that a function doesn't clobber a certain call-clobbered register, even if they don't inline it. $\endgroup$ Commented Jan 4 at 22:21

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .