15

In C# and C++, an apparent trend is in place to reduce / avoid inheritance:

C#: "Sealing types can improve performance."

https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1852

C++ "Prefer concrete types..."

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#SS-concrete

I can understand this guideline with regards to complexity/brittleness, however with regards to performance, I question why the compilers or "jitters" can't handle this.

The inheritance-performance discussion often refers to vtables and/or the levels of indirection involved in resolving the address of the function/method. Understood. But:

Don't we need to take a step back here?

Under the hood, it generally all boils down to simply functions being called the with the this pointer as first argument.

Whoever said that vtables are the only way to implement inheritance? Why can't the compiler/jitter optimize out the call resolutions at compile time?

6
  • 3
    Rust only supports concrete objects except for pure interface implementations, but it's not a .NET language.
    – wizzwizz4
    Commented Oct 2, 2023 at 20:53
  • Interesting. Much of my work is in natively compiled languages like C/C++ and I've heard of rust (not sure the name is ideal for marketing the language :-) ). Thank you! If you happen to know, does it name mangle like C++ or does it have a clean C interface e.g., for C# interop?
    – user417068
    Commented Oct 2, 2023 at 22:17
  • 1
    Rust mangles names (in an implementation-specific manner), but you can turn this off with #[no_mangle] on a case-by-case basis. The extern keyword gives reasonably good FFI (it's much neater than C++'s, imo). (However, you should really ask a separate question if you want to know more; these comments will probably get cleaned up.)
    – wizzwizz4
    Commented Oct 2, 2023 at 22:24
  • 1
    I have wondered the same thing too. By default, the compiler does not do this because classes can be visible to other assemblies, and therefore have to be able to be extended if sealed is not explicitly specified. There are packages that can do this automatically.
    – Hugo
    Commented Oct 3, 2023 at 15:01
  • 1
    "Whoever said that vtables are the only way to implement inheritance? Why can't the compiler/jitter optimize out the call resolutions at compile time?" It can only do so when the type is known at compile time, i.e. it won't work for polymorphic dispatch. Virtual method dispatch only applies when accessing a subclass through a variable typed as the superclass.
    – Beefster
    Commented Oct 3, 2023 at 19:11

6 Answers 6

37

Don't we need to take a step back here?

Under the hood, it generally all boils down to simply functions being called the with the this pointer as first arg.

It's good to question things from first principles! In this case though, this is a bit of an over-simplification. Yes, methods are just functions that take a receiver (goes by different names in different languages, but usually this or self). But the challenge is knowing which method implementation to invoke.

Whoever said that vtables are the only way to implement inheritance?

Nobody! And they're not. Another approach is message-passing, where methods are selected by name (such as in the Smalltalk lineage of languages, including Ruby and Objective C). In this model, classes have method tables, which are roughly just hash tables that map strings (or some other notion of a "selector", "message id", etc.) to function pointers.

By comparison, vtables are much faster, because they're just simple arrays of function pointers, identified by index. They can be looked up more directly, and don't need hashing.

Why can't the compiler/jitter optimize out the call resolutions at compile time?

They can, and often do.

In the general case, polymorphic calls need some kind of dynamic dispatch mechanism, whose job is to find the correct implementation of a method at runtime, given the concrete type of a receiver (self/this). There are special cases that don't need it, and the goal of an optimizer is to discover those special cases and devirtualize expensive dynamically dispatched method calls into fast statically dispatches ones, whenever possible.

Some keywords can help, by giving the optimizer hard facts, rather than leaving it to infer things for itself:

  • static methods can always be statically dispatched. E.g. Math.abs has only one implementation, so there's nothing left to resolve at runtime. The compiler can just emit a static dispatch to the function that implements that method.
  • final (sealed in C#) methods can't be overridden, so they might only have one implementation (assuming they're not overrides of a parent method themselves). While the receiver object might vary, its type is always the same, so the same one implementation is always invoked. These can also use static dispatch.
    • So "non-final" means "maybe polymoprphic", and needs further investigation by the optimizer. In the general case it needs dynamic dispatch.
    • "non-override final" means definitely not polymorphic, and can always just use a simple static dispatch.
  • final class has similar consequences to marking each of its methods final.

As you see, the non-final case leaves it up the optimizer to figure things out. There are some tricks it can try:

  1. At build time: An ahead-of-time compiler could analyze all the non-final methods in the program (and any linked libraries), and see which ones actually have overrides. The ones that don't have any, could be implemented with static dispatch. It's as if they were marked final, in this version of the program.
  2. At runtime: A just-in-time compiler could inspect the type of the receiver of a method call. If it discovers that it's only ever one type, then it can replace the dynamic dispatch with a type-check (to confirm the assumption is correct) and a fast path (static dispatch to the assumed type's implementation). This is called a monomorphic inline cache, and there are other variations on it (polymorphic and megamorphic), with different trade-offs.

As you see, these heuristics are just that, heuristics, and are fundamentally limited:

  1. Dynamically-linked libraries are opaque to the AOT compilers. They can't see what's going on in there, and have to emit code for the general case. Marking classes final or their methods as final prevents this.
  2. JIT compilers usually don't optimize a piece of code until its been hit enough times to warrant the expense of JITing. Marking final could do the fast thing from the start.

So by marking something final yourself, you're imposing restrictions on yourself, which gives certain guarantees that the optimizer can take advantage of.

Further reading:

  1. https://quuxplusone.github.io/blog/2021/02/15/devirtualization/
  2. https://www.swift.org/blog/whole-module-optimizations/
2
  • 1
    There’s also a second problem with a compiler assuming final. (At least in C#) First, you can only do it with private types, because an outside library might extend the type. Second, even if you enumerate all types in your static code, that doesn’t prevent someone creating a dynamic proxy or similar.
    – Patrick M
    Commented Oct 4, 2023 at 14:57
  • One easy optimization in C#: If you call a method on something you know is a derived type, and that derived type is final or the method has a final implementation, you can avoid a vtable lookup.
    – Patrick M
    Commented Oct 4, 2023 at 15:00
12

Imagine this extremely simple example

A value = Random.Shared.Next(0, 2) == 0 ? new B() : new C();
value.M();

abstract class A
{
    public abstract void M();
}

class B : A
{
    public override void M()
    {
        Console.WriteLine("Called B");
    }
}

class C : A
{
    public override void M()
    {
        Console.WriteLine("Called C");
    }
}

How do suggest compiler to optimize this call to M()? When it cannot be known during compile time if B or C is in value? So compiler cannot know if it should call B.M() or C.M() and needs a runtime way to decide which to call based on type of object stored in value. Thats what virtual tables do.

28
  • 2
    While there is room for interpretation in OP's underlying assumptions; I suspect OP's argument is based on the compiler having knowledge of the whole inheritance graph (i.e. the collection of A, B, C and how they relate) and therefore being able to individually optimize the execution for both the explicit type being used (A) and all of its descendants (recursively, i.e. B and C and any of their possible descendants). The example you provide here still shows a codebase where the compiler knows about the whole graph, so it doesn't disprove OP's underlying assumption.
    – Flater
    Commented Oct 1, 2023 at 22:20
  • 1
    @Coder “What does OO really give us?” Code organization, by letting you say “make it so”, while abstracting the details of how. This is also what FP does, in a different way. “overly complex” it can be. On the contrary, imagine would the code would look like if every polymorphic method call was instead a hand-rolled switch over a value’s type and calling one of several concrete functions. It’s worse.
    – Alexander
    Commented Oct 1, 2023 at 23:09
  • 3
    @Flater The compiler has complete knowledge of the entire inheritance hierarchy, true. But how does that help making a "simple" function call instead of one that uses some kind of indirection to cope with the unknown actual type at run time? Commented Oct 2, 2023 at 1:01
  • 5
    @Flater You say the example "doesn't disprove the OP's underlying assumption". Yes it does. The assumption that a compiler can somehow replace indirect function calls with "simple" function calls just because all involved classes are known is wrong. Commented Oct 2, 2023 at 3:27
  • 4
    @Coder: The point being made here is that the choice of particular DB can be made after compilation (e.g. using a config switch), at which point the compilation could not have known what DB was going to be used.
    – Flater
    Commented Oct 2, 2023 at 4:13
7

Investigating your underlying claim

For the purpose of example, let's say that you have a codebase where there is a MyType class, and there is also some kind of base-type-oriented logic, let's say:

public class MyService
{
    public void DoSomething(MyType b)
    {
        Console.WriteLine(b.CreateMessage());
    }
}

So, let's first investigate your claim.

Why can't the compiler/jitter optimize out the call resolutions at compile time?

What you're saying here is that at compile time, your compiler knows that MyType is never inherited, and therefore it can optimize its compilation based on a MyType-oriented execution.

In other words, we can ask ourselves the following question:

Can we have a different type (which derives from MyType) that the compiler could not have possibly known about at compile time?

If the answer to that question is "no", then your claim is correct. That would mean that the compiler has perfect knowledge of how MyType is inherited and can therefore optimize the execution based onthat.

If the answer to that question is "yes", then your claim is incorrect. It would mean that the compiler cannot be certain that only MyType will be used, and therefore it cannot make additional optimizations that are tailored to MyType.

Countering your claim

You could probably already see that I'm leading you on a path where the answer to the above question is "yes". That is indeed the case. Let's explore an example use case where the compiler could not have possibly known all of the types in advance.
I'm intentionally not giving examples that have already been given in other answers. There's more than one way in which your claim is not fully correct.

Suppose you compile your library into a DLL. This DLL includes MyType and MyService (which contains the DoSomething logic). This is now all baked into the DLL and set in stone.

You send me this DLL, and I add it to my own codebase. I end up creating a MyDerivedType class which derives from MyType. Due to the rules of inheritance, I am now perfectly allowed to do something like this:

public void MyVeryOwnMethod()
{
    var myService = new MyService();
    var myObj = new MyDerivedType();

    myService.DoSomething(myObj);
}

And there you have it. DoSomething is being passed an object of type MyDerivedType, which is not something that your library's compiler would have been able to account for.

If I had referenced your project (i.e. the source code), then my compiler would be able to account for all types, i.e. MyType and MyDerivedType. But this is not how it works when you compile your library as a DLL and then share that DLL (not your source code), and this is a use case that directly goes against your claim that the compiler (of your library code) could have known about all possible types being used to call DoSomething with.

Because of this, the compiler knows that it doesn't know all possible types. So when you compile your library code, the compiler doesn't engage in any optimizations that would be based on perfectly knowledge of the whole inheritance graph of MyDerivedType.

Sealing a class

Your question was about if and how sealed can add value with regards to compiler optimization.

Let's revise our original example, and let's say that we sealed the MyType class. Now, the situation has changed. When compiling your library, the compiler is now capable of knowing that it's impossible for any future consumers of this DLL to further derive from it, and therefore the compiler can be confident that it knows the whole MyType inheritance graph and make optimizations based on that knowledge.

This is how the sealed keyword can positively impact the compiler's ability to optimize the execution.

2
  • 1
    Though a compiler could not identify this scenario, a JIT could. I believe that in Java it does just that: the JIT might inline or otherwise optimize (virtual) methods if at runtime it is not actually overridden, and if a subclass gets loaded later that invalidates the assumption, it will deoptimize that code) Commented Oct 2, 2023 at 10:03
  • @MarkRotteveel: This touches on the source code vs DLL distinction I mentioned. They are different but both have their reasons for existing.
    – Flater
    Commented Oct 2, 2023 at 12:19
6

Why can't the compiler/jitter optimize out the call resolutions at compile time?

It can in some cases, but there are reasonable scenarios that preclude compile-time identification of the proper method to invoke.

Imagine, for example, we have a linked list involving polymorphism.  While all the elements on the list are objects that inherit from a common base class, each element can actually be of a different concrete type.

And now we want to traverse the list, so with each element of the list, invoke a virtual method introduced by the base class and overriden by each of the concrete classes.

There's no way to optimize resolutions at compile time, since each element can be of a different concrete type, meaning having a different override, so different method to invoke for it.

That's not to say that virtual tables and method dispatch are the only possible approach here.  Let's say that the number of concrete classes is small: the JIT can do an if-then-else sequence instead of virtual method call.  In some cases, the method to call can be in-lined within the proper case.


Further complicating things in some languages is dynamic class loading, in which at runtime additional concrete classes whose instances may appear on the same linked list are loaded — definitely unknown to the compiler working on the other code.

3
  • (Linked list may cause unnecessary confusion.) The issue is calling a method using the base method name, through an arbitrary derived class. Why can't the compiler (generally) create concrete pseudo-classes under the hood? Example: SomeDerived overrides SomeMethod of SomeBase. I propose that the compiler should generate a function that is called SomeMethodBaseDerived and the SomeDerived this pointer is its first arg.
    – user417068
    Commented Sep 30, 2023 at 21:38
  • 3
    It still has to dispatch dynamically, just changing where that happens.
    – Erik Eidt
    Commented Sep 30, 2023 at 23:54
  • 1
    Thank you Erik, I upvoted. (PS: After decades of C++ and C#, I'm now focused on C# and interop with C [not C++] for performance critical code.).
    – user417068
    Commented Oct 1, 2023 at 22:51
3

The whole point of virtual function calls is that at compile time it is unknown which code will be executed.

Swift has a partial solution: You can specify whether a class can be subclassed in a different module, which is not the default. Without that, the compiler knows all subclasses. And knows which methods could be overridden but are not actually. Or it knows that if you are executing code in some override then self must belong to some point in the inheritance path and may know what code virtual calls will execute. Sometimes.

Btw Many processors do branch target prediction for virtual function calls as well, so in situations where always or almost always the same method is called, the cost will be very low.

1

Why can't the compiler/jitter optimize out the call resolutions at compile time?

It can, you help it by marking some classes as sealed, The linter also knows that if a class is internal, and has no derived classes in the same project, then it could potentially be marked as sealed without side effects. (Ignoring InternalsVisibleTo and reflection I guess)

Could the compiler do the same check and automatically add sealed? maybe, but there is already the extras InternalsVisibleTo check you would have to do, and already a way of signalling the optimisation is possible with sealed so you can imagine the arguments going back and forth. The complexities of referencing dlls which have already been compiled etc.

It's easy to add a warning to a linter, as the user can always override it in edge cases or ignore it if they don't think the performance it worth the change. But the compiler has to be perfect.