3
$\begingroup$

In a scope that defines two variables a and b, the code firstly does things only on a. Then a is not used anymore. And it does things only on b afterward.

A possible optimization is to deallocate a before using b, and allocate b in the same memory location as a (if they are allocated on the stack).

In C you could manually do this by grouping each of the two parts using braces. But are there languages or compilers that could automatically do this as an optimization? (I believed some compilers do, but found they don't.) I could accept constructors and destructors are not called in a reliable time sequence in such a language, or at least in the situations that support the optimization. But could it still be possible if there are pointers may or may not be pointing to the variable?


This was the code I was testing in gcc and g++:

#include<stdio.h>

int main(){
    int a[2000000];
    scanf("%d",&a[900000]);
    printf("%d\n",a[900000]);
    int b[2000000];
    scanf("%d",&b[900000]);
    printf("%d\n",b[900000]);
}

It segfaulted but would work with the b part all removed. Adjust the numbers if your system had a different default stack size.

I did believe gcc has this optimization as the comments suggest, but it didn't pass the test, even with -O3. Before knowing this, what I intended to ask was whether there are any advances (assuming it has existed in the first place) to this kind of optimization, with the new concepts such as ownership.

$\endgroup$
6
  • $\begingroup$ Isn't this exactly the result of standard register-allocation algorithms? $\endgroup$
    – Michael Homer
    Commented Feb 25 at 1:01
  • 2
    $\begingroup$ C compilers already do this, even without braces. On godbolt.org/z/v9bd13xaY you can see that the same register is reused for a and b. $\endgroup$ Commented Feb 25 at 1:12
  • $\begingroup$ @MichaelHomer Well, apparently my first test was correct and I made mistakes while retesting them before posting the previous comment. Saying this similar to register allocation is reasonable. But the test shows it isn't used in general variables. I wonder if it is because of the difficulty of dealing with pointers. $\endgroup$
    – user23013
    Commented Feb 25 at 1:44
  • 1
    $\begingroup$ This is an extraordinarily specific conception of "general variables", but I think this is essentially an exact duplicate of this question in that case. This example requires uniqueness to be enforced, because there is otherwise an unrestricted alias into a handed to other (linked) code and so it obviously can't be reused. That isn't really an optimisation issue, it's just C semantics. I don't think this is necessarily a useful example for the rest of the question that is barely about that material element, but the emphasis could be shifted. $\endgroup$
    – Michael Homer
    Commented Feb 25 at 2:01
  • $\begingroup$ @MichaelHomer While I don't think the questions are exactly the same, static single assignment and register allocation does answer what class of problem it belongs to. Nonuniform sizes and pointers adds significantly more complication, but I probably shouldn't expect there are better answers than general memory management and garbage collection. $\endgroup$
    – user23013
    Commented Feb 25 at 2:34

6 Answers 6

8
$\begingroup$

Yes. The Koka programming language does this. What you are looking for is referred to by the authors as "garbage-free" memory management, and "reuse" (in various forms).

  • "garbage-free": objects are destructed immediately on their last use, via Koka's ownership analysis / reference counting. This means that at any given point in the program's execution, there are no dead objects (or they are immediately about to be destructed). Whether this is a tangible benefit over destructing just at the end of scope / sometime later in most cases is not clear, but it allows for...
  • "reuse": when objects are destructed, rather than actually calling dealloc, the compiler can instead keep that memory and do what it wants with it. This is best explored in the FP^2 paper, for updating ex. lists in-place without allocations or growing the stack.

The publications describing how Koka's fancier features work are quite readable and I recommend sitting down and flipping through them. Some choice quotes/summaries:

Garbage Free Reference Counting with Reuse

Starting from a functional core language with explicit control-flow, Perceus emits precise reference counting instructions such that programs are garbage free, where only live references are retained.

Reference Counting with Frame Limited Reuse

An important extension is reuse analysis. This optimization pairs objects of known size with fresh allocations of the same size and tries to reuse the object in-place at runtime if it happens to be unique.

Fully in-Place Functional Programming

In this paper we describe a novel fully in-place (FIP) calculus that guarantees that such a function can be compiled in a way to never (de)allocate memory or use unbounded stack space—it can be executed fully in-place.

$\endgroup$
1
  • $\begingroup$ This sounds cool. So, if you have typical FP "pipeline" structure, where you step-wise transform a value, throwing all intermediate values away, you would basically just toggle between two memory locations, both of which are hot in L1 cache. Nice. $\endgroup$ Commented Feb 25 at 11:23
4
$\begingroup$

This is related to object lifetime optimization.

LLVM has what they call a StackColoring Pass that does exactly that. In the documentation comment on that page:

Stack Coloring reduces stack usage by merging stack slots when they can't be used together. For example, consider the following C program:

     void bar(char *, int);
     void foo(bool var) {
         A: {
             char z[4096];
             bar(z, 0);
         }

         char *p;
         char x[4096];
         char y[4096];
         if (var) {
             p = x;
         } else {
             bar(y, 1);
             p = y + 1024;
         }
     B:
         bar(p, 2);
     } 

Naively-compiled, this program would use 12k of stack space. However, the stack slot corresponding to z is always destroyed before either of the stack slots for x or y are used, and then x is only used if var is true, while y is only used if var is false. So in no time are 2 of the stack slots used together, and therefore we can merge them, compiling the function using only a single 4k alloca:

     void foo(bool var) { // equivalent
         char x[4096];
         char *p;
         bar(x, 0);
         if (var) {
             p = x;
         } else {
             bar(x, 1);
             p = x + 1024;
         }
         bar(p, 2);
     } 

This is an important optimization if we want stack space to be under control in large functions, both open-coded ones and ones created by inlining.

$\endgroup$
3
$\begingroup$

This is just a comment, but consider this line in your example code.

scanf("%d",&a[900000]);

A C compiler would require some extra information (e.g. escape analysis) to know that scanf doesn't retain a copy of that pointer, such as in a static variable. Being a built-in function, perhaps a compiler can just "know" what scanf does, but that's another kind of extra information.

In the absence of that information, it might be an incorrect "optimisation" to deallocate the variable earlier than the end of the scope.

$\endgroup$
1
  • 1
    $\begingroup$ Indeed, if you replace the scanf call with something that reads a value (like atoi(fgets(tmp, sizeof(tmp), stdin)) and assign that to the array element, things get optimized. $\endgroup$
    – Chris Dodd
    Commented Mar 20 at 0:40
2
$\begingroup$

But are there languages or compilers that could automatically do this as an optimization?

Absolutely yes. A very common optimization for a register allocator is to notice that two local variables have non-overlapping lifetimes as you describe, and use the same register for both; a variable might never be reified as a memory location. (Obviously in languages that allow taking the addresses of variables, this optimization does not apply if the variable's address is taken!)

I could accept constructors and destructors are not called in a reliable time sequence in such a language, or at least in the situations that support the optimization.

Oh there are some nasty situations you can get into in garbage collected languages. C# does not guarantee that a local variable containing that reference keeps the referred-to object alive, and the finalizers typically run on another thread. For example, suppose you have an unmanaged code library that creates a temporary disk storage, and deletes it when the storage object is finalized:

var x = CreateTempStorage();
var y = CreateFileInTempStorage(x);
y.Write(whatever)

If x is never used again in this method, the garbage collector is totally within its rights to run the finalizer immediately and destroy the temporary storage. Since the finalizer runs on its own thread, now we've got a race between all operations on that storage, and its destruction, and who knows what will happen? (This is not a theoretical concern; in the early days of .NET I wrote an interop layer on top of the COM IStorage API and had to use various techniques to ensure that the runtime would always destroy the "parent" objects after their children.)

It gets even worse; in C# if the garbage collector determines that an object is never used it is possible that the constructor and the finalizer are running at the same time on different threads. Writing a correct finalizer is quite tricky.

$\endgroup$
1
$\begingroup$

Swift's optimizer does this.

The language guarantees objects live until their last use, not necessary until the end of scope.

E.g.

class C {
    init() { print("Initialized  a new object at \(Unmanaged.passUnretained(self).toOpaque())") }
    deinit { print("Deinitializing the object at \(Unmanaged.passUnretained(self).toOpaque())") }
}

func main() {
    let x = C()
    let y = C()
}

main()

Prints:

Initialized  a new object at 0x0000600003a64030
Deinitializing the object at 0x0000600003a64030
Initialized  a new object at 0x0000600003a64030
Deinitializing the object at 0x0000600003a64030

(It doesn't apply to top-level global variables, so I wrapped this demonstration up in this little main() function)

To keep an object alive for longer, you can use withExtendedLifetime(_:_:)

$\endgroup$
0
$\begingroup$

C++ standard basically mandates that an initialized object stays valid in its scope after initialization (and destructed/deallocated only at the end of scope), so this optimization it's not really possible.

For languages that are ostensibly interpreted in virtual stack machines, this optimization may happen, as any stack popping is implicit and immediately "forgetting" any variable that is not used anymore.

And then you may need to start using things like Reference.reachabilityFence(obj) to avoid objects being collected while methods are still running.

TL;DR: Some language specifications forbides this, and in languages where this is permitted, problems may arise.

$\endgroup$
1
  • 3
    $\begingroup$ but at the same time the C++ spec also has the as-if rule that allows it to do any conversion to the code as long as the external effect happen as if it was the code as written. That allows eliminating a prematurely when it is not used while b is live. $\endgroup$ Commented Feb 26 at 11:01

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .