53

Why is this OK and mostly expected:

abstract type Shape
{
   abstract number Area();
}

concrete type Triangle : Shape
{
   concrete number Area()
   {
      //...
   }
}

...while this is not OK and nobody complains:

concrete type Name : string
{
}

concrete type Index : int
{
}

concrete type Quantity : int
{
}

My motivation is maximising the use of type system for compile-time correctness verification.

PS: yes, I have read this and wrapping is a hacky work-around.

3
  • 1
    Comments are not for extended discussion; this conversation has been moved to chat.
    – maple_shaft
    Commented Aug 11, 2016 at 16:51
  • I had a similar motivation in this question, you might find it interesting. Commented Aug 12, 2016 at 15:06
  • I was going to add an answer confirming the "you don't want inheritance" idea, and that wrapping is very powerful, including giving you whichever of implicit or explicit casting (or failures) you want, especially with JIT optimisations suggesting you'll get almost the same performance anyway, but you've linked to that answer :-) I would only add, it would be nice if languages added features to reduce boilerplate code needed for forwarding properties/methods, especially if there's only a single value.
    – Mark Hurd
    Commented Aug 17, 2016 at 3:56

10 Answers 10

83

I assume you are thinking of languages like Java and C#?

In those languages primitives (like int) are basically a compromise for performance. They don't support all features of objects, but they are faster and with less overhead.

In order for objects to support inheritance, each instance need to "know" at runtime which class it is an instance of. Otherwise overridden methods cannot be resolved at runtime. For objects this means instance data is stored in memory along with a pointer to the class object. If such info should also be stored along with primitive values, the memory requirements would balloon. A 16 bit integer value would require its 16 bits for the value and additionally 32 or 64 bit memory for a pointer to its class.

Apart from the memory overhead, you would also expect to be able to override common operations on primitives like arithmetic operators. Without subtyping, operators like + can be compiled down to a simple machine code instruction. If it could be overridden, you would need to resolve methods at runtime, a much more costly operation. (You may know that C# supports operator overloading - but this is not the same. Operator overloading is resolved at compile time, so there is no default runtime penalty.)

Strings are not primitives but they are still "special" in how they are represented in memory. For example they are "interned", which means two strings literals which are equal can be optimized to the same reference. This would not be possible (or a least a lot less effective) if string instances should also keep track of the class.

What you describe would certainly be useful, but supporting it would require a performance overhead for every use of primitives and strings, even when they don't take advantage of inheritance.

The language Smalltalk does (I believe) allow subclassing of integers. But when Java was designed, Smalltalk was considered too slow, and the overhead of having everything be an object was considered one of the main reasons. Java sacrificed some elegance and conceptual purity to get better performance.

26
  • 13
    @Den: string is sealed because it is designed to behave immutable. If one could inherit from string, it would be possible to create mutable strings, which would make it really error prone. Tons of code, includung the .NET framework itself, relies on strings having no side-effects. See also here, tells you the same: quora.com/Why-String-class-in-C-is-a-sealed-class
    – Doc Brown
    Commented Aug 10, 2016 at 13:42
  • 5
    @DocBrown This is also the reason String is marked final in Java as well.
    – Dev
    Commented Aug 10, 2016 at 13:50
  • 47
    "when Java was designed, Smalltalk was considered too slow […]. Java sacrificed some elegance and conceptual purity to get better performance." – Ironically, of course, Java didn't actually gain that performance until Sun bought a Smalltalk company to get access to Smalltalk VM technology because Sun's own JVM was dog-slow, and released the HotSpot JVM, a slightly modified Smalltalk VM. Commented Aug 10, 2016 at 14:57
  • 3
    @underscore_d: The answer you linked to very explicitly states that C♯ does not have primitive types. Sure, some platform for which there exists an implementation of C♯ may or may not have primitive types, but that does not mean that C♯ has primitive types. E.g., there is an implementation of Ruby for the CLI, and the CLI has primitive types, but that does not mean that Ruby has primitive types. The implementation may or may not choose to implement value types by mapping them to the platform's primitive types but that is a private internal implementation detail and not part of the spec. Commented Aug 10, 2016 at 22:07
  • 10
    It's all about abstraction. We have to keep our head clear, otherwise we end up with nonsense. For example: C♯ is implemented on .NET. .NET is implemented on Windows NT. Windows NT is implemented on x86. x86 is implemented on silicone dioxide. SiO₂ is just sand. So, a string in C♯ is just sand? No, of course not, a string in C♯ is what the C♯ spec says it is. How it is implemented is irrelevant. A native implementation of C♯ would implement strings as byte arrays, an ECMAScript implementation would map them to ECMAScript Strings, etc. Commented Aug 10, 2016 at 22:13
20

What some language propose is not subclassing, but subtyping. For example, Ada lets you create derived types or subtypes. The Ada Programming/Type System section is worth reading to understand all details. You can restrict the range of values, which is what you want most of the time:

 type Angle is range -10 .. 10;
 type Hours is range 0 .. 23; 

You can use both types as Integers if you convert them explicitly. Note also that you can't use one in place of another, even when the ranges are structurally equivalent (types are checked by names).

 type Reference is Integer;
 type Count is Integer;

Above types are incompatible, even though they represent the same range of values.

(But you can use Unchecked_Conversion; don't tell people I told you that)

9
  • 2
    Actually, I think it is more about semantics. Using a quantity where an index is expected would then hopefully cause a compile time error Commented Aug 10, 2016 at 10:36
  • @MarjanVenema It does, and this is done on purpose to catch logic errors.
    – coredump
    Commented Aug 10, 2016 at 10:41
  • My point was that not all cases where you want the semantics, you'd need the ranges. You would then have type Index is -MAXINT..MAXINT;which somehow doesn't do anything for me as all integers would be valid? So what kind of error would I get passing an Angle to an Index if all that is checked are the ranges? Commented Aug 10, 2016 at 10:44
  • 1
    @MarjanVenema In she second example both types are subtypes of Integer. However, if you declare a function which accepts a Count, you cannot pass a Reference because type checking is based on name equivalence, which is the contrary of "all that is checked are the ranges". This is not limited to integers, you could use enumerated types or records. (archive.adaic.com/standards/83rat/html/ratl-04-03.html)
    – coredump
    Commented Aug 10, 2016 at 10:54
  • 1
    @Marjan One nice example of why tagging types can be quite powerful can be found in Eric Lippert's series on implementing Zorg in OCaml. Doing this allows the compiler to catch lots of bugs - on the other hand if you allow to implicitly convert types this seems to make the feature useless.. it doesn't make semantic sense being able to assign a PersonAge type to a PersonId type just because they both happen to have the same underlying type.
    – Voo
    Commented Aug 12, 2016 at 19:59
17

I think this might very well be an X/Y question. Salient points, from the question...

My motivation is maximising the use of type system for compile-time correctness verification.

...and from your comment elaborating:

I don't want to be able to substitute one for another implicitly.

Excuse me if I'm missing something, but... If these are your aims, then why on Earth are you talking about inheritance? Implicit substitutability is... like... its entire thing. Y'know, the Liskov Substitution Principle?

What you seem to want, in reality, is the concept of a 'strong typedef' - whereby something 'is' e.g. an int in terms of range and representation but cannot be substituted into contexts that expect an int and vice-versa. I'd suggest searching for info on this term and whatever your chosen language(s) might call it. Again, it's pretty much literally the opposite of inheritance.

And for those who might not like an X/Y answer, I think the title might still be answerable with reference to the LSP. Primitive types are primitive because they do something very simple, and that's all they do. Allowing them to be inherited and thus making infinite their possible effects would lead to great surprise at best and fatal LSP violation at worst. If I may optimistically assume Thales Pereira won't mind me quoting this phenomenal comment:

There is the added problem that If someone was able to inherit from Int, you would have innocent code like "int x = y + 2" (where Y is the derived class) that now writes a log to the Database, opens a URL and somehow resurrect Elvis. Primitive types are supposed to be safe and with more or less guaranteed, well-defined behavior.

If someone sees a primitive type, in a sane language, they rightly presume it will always just do its one little thing, very well, without surprises. Primitive types have no class declarations available that signal whether they may or may not be inherited and have their methods overridden. If they were, it would be very surprising indeed (and totally break backwards compatibility, but I'm aware that's a backwards answer to 'why was X not designed with Y').

...although, as Mooing Duck pointed out in response, languages that allow operator overloading enable the user to confuse themselves to a similar or equal extent if they really want, so it's dubious whether this last argument holds. And I'll stop summarising other people's comments now, heh.

0
4

In order to allow inheritance with virtual dispatch 8which is often considered quite desirable in application design), one needs runtime type information. For every object, some data regarding the type of the object has to be stored. A primitive, per definition, lacks this information.

There are two (managed, run on a VM) mainstream OOP languages that feature primitives: C# and Java. Many other languages do not have primitives in the first place, or use similar reasoning for allowing them / using them.

Primitives are a compromise for performance. For each object, you need space for its object header (In Java, typically 2*8 bytes on 64-bit VMs), plus its fields, plus eventual padding (In Hotspot, every object occupies a number of bytes that is a multiple of 8). So an int as object would need at least 24 bytes of memory to be kept around, instead of only 4 bytes (in Java).

Thus, primitive types were added to improve performance. They make a whole lot of things easier. What does a + b mean if both are subtypes of int? Some kind of dispathcing has to be added to choose the correct addition. This means virtual dispatch. Having the ability to use a very simple opcode for the addition is much, much faster, and allows for compile-time optimizations.

String is another case. Both in Java and C#, String is an object. But in C# its sealed, and in Java its final. That because both the Java and C# standard libraries require Strings to be immutable, and subclassing them would break this immutability.

In case of Java, the VM can (and does) intern Strings and "pool" them, allowing for better performance. This only works when Strings are truly immutable.

Plus, one rarely needs to subclass primitive types. As long as primitives can not be subclassed, there are a whole lot of neat things that maths tells us about them. For example, we can be sure that addition is commutative and associative. Thats something the mathematical definition of integers tells us. Furthermore, we can easily prrof invariants over loops via induction in many cases. If we allow subclassing of int, we loose those tools that maths gives us, because we no longer can be guaranteed that certain properties hold. Thus, I'd say the ability not to be able to subclass primitive types is actually a good thing. Less things someone can break, plus a compiler can often proof that he is allowed to do certain optimizations.

15
  • 1
    This answer is abys... narrow. to allow inheritance, one needs runtime type information. False. For every object, some data regarding the type of the object has to be stored. False. There are two mainstream OOP languages that feature primitives: C# and Java. What, is C++ not mainstream now? I'll use it as my rebuttal as runtime type information is a C++ term. It's absolutely not required unless using dynamic_cast or typeid. And even if RTTI's on, inheritance only consumes space if a class has virtual methods to which a per-class table of methods must be pointed per instance Commented Aug 10, 2016 at 21:09
  • 1
    Inheritance in C++ works a whole lot different then in languages run on a VM. virtual dispatch requires RTTI, something that wasn't oiginally part of C++. Inheritance without virtual dispatch is very limited and I'm not even sure if you should compare it to inheritance with virtual dispatch. Furthermore, the notion of an "object" is very different in C++ then it is in C# or Java. You are right, there are some things i could word better, but tbh getting into all the quite involved points quickly leads to having to write a book on language design.
    – Polygnome
    Commented Aug 10, 2016 at 21:17
  • 3
    Also, it is not the case that "virtual dispatch requires RTTI" in C++. Again, only dynamic_cast and typeinfo require that. Virtual dispatch is practically implemented using a pointer to the vtable for the concrete class of the object, thus letting the right functions be called, but it does not require the detail of type and relation inherent in RTTI. All the compiler needs to know is whether an object's class is polymorphic and, if so, what the instance's vptr is. One can trivially compile virtually dispatched classes with -fno-rtti. Commented Aug 10, 2016 at 21:32
  • 2
    It's in fact the other way arround, RTTI requires virtual dispatch. Literally -C++ doesn't allow dynamic_cast on classes without virtual dispatch. The implementation reason is that RTTI is generally implemented as a hidden member of a vtable.
    – MSalters
    Commented Aug 11, 2016 at 8:09
  • 1
    @MilesRout C++ has everything a language needs for OOP, at least the somewhat newer standards. One might argue that the older C++ standards lack some things that are needed fo an OOP language, but even that is a stretch. C++ is not a high level OOP language, as it allows a more direct, low level control over some things, but it allows OOP nonetheless. (High level / Low level here in terms of abstraction, other language like managed ones abstract more of the system away then C++, hence their abstraction is higher).
    – Polygnome
    Commented Aug 19, 2016 at 9:52
4

In mainstream strong static OOP languages, sub-typing is seen primarily as a way to extend a type and to override the type's current methods.

To do so, 'objects' contain a pointer to their type. This is a overhead: the code in a method that uses a Shape instance first has to access the type information of that instance, before it knows the correct Area() method to call.

A primitive tends to only allow operations on it that can translate into single machine language instructions and do not carry any type information with them. Making an integer slower so that someone could subclass it was unappealing enough to stop any languages that did so becoming mainstream.

So the answer to:

Why do mainstream strong static OOP languages prevent inheriting primitives?

Is:

  • There was little demand
  • And it would have made the language too slow
  • Subtyping was primarily seen as a way to extend a type, rather than a way to get better (user-defined) static type checking.

However, we are starting to get languages that allow static checking based on properties of variables other then 'type', for example F# has "dimension" and "unit" so that you can't, for example, add a length to an area.

There are also languages that allow 'user-defined types' that don't change (or exchange) what a type does, but just help with static type checking; see coredump's answer.

2
  • F# units of measure is a nice feature, although unfortunately misnamed. Also it's compile-time only, so not super-useful e.g. when consuming a compiled NuGet package. Right direction, though.
    – Den
    Commented Aug 10, 2016 at 12:33
  • It's perhaps interesting to note that "dimension" is not "a property other than 'type'", it's just a more rich kind of type than you may be used to.
    – porglezomp
    Commented Aug 11, 2016 at 19:18
3

I'm not sure if I'm overlooking something here, but the answer is rather simple:

  1. The definition of primitives is: primitive values are not objects, primitive types are not object types, primitives are not part of the object system.
  2. Inheritance is a feature of the object system.
  3. Ergo, primitives cannot take part in inheritance.

Note that there are really only two strong static OOP languages which even have primitives, AFAIK: Java and C++. (Actually, I'm not even sure about the latter, I don't know much about C++, and what I found when searching was confusing.)

In C++, primitives are basically a legacy inherited (pun intended) from C. So, they don't take part in the object system (and thus inheritance) because C has neither an object system nor inheritance.

In Java, primitives are the result of a misguided attempt at improving performance. Primitives are also the only value types in the system, it is, in fact, impossible to write value types in Java, and it is impossible for objects to be value types. So, apart from the fact that primitives don't take part in the object system and thus the idea of "inheritance" doesn't even make sense, even if you could inherit from them, you wouldn't be able to maintain the "value-ness". This is different from e.g. C♯ which does have value types (structs), which nonetheless are objects.

Another thing is that not being able to inherit is actually not unique to primitives, either. In C♯, structs implicitly inherit from System.Object and can implement interfaces, but they can neither inherit from nor inherited by classes or structs. Also, sealed classes cannot be inherited from. In Java, final classes cannot be inherited from.

tl;dr:

Why do mainstream strong static OOP languages prevent inheriting primitives?

  1. primitives are not part of the object system (by definition, if they were, they wouldn't be primitive), the idea of inheritance is tied to the object system, ergo primitive inheritance is a contradiction in terms
  2. primitives are not unique, lots of other types cannot be inherited as well (final or sealed in Java or C♯, structs in C♯, case classes in Scala)
18
  • 3
    Ehm... I know it's pronounced "C Sharp", but, ehm
    – Mr Lister
    Commented Aug 11, 2016 at 6:45
  • I think you're pretty mistaken on the C++ side. It's not a pure OO language at all. Class methods by default are not virtual, which means they don't obey LSP. E.g. std::string isn't a primitive, but it very much behaves as just another value. Such value semantics are quite common, the whole STL part of C++ assumes it.
    – MSalters
    Commented Aug 11, 2016 at 8:05
  • 2
    'In Java, primitives are the result of a misguided attempt at improving performance.' I think you have no idea about the magnitude of the performance hit of implementing primitives as user expandable object types. That decision in java is both deliberate and well founded. Just imagine having to allocate memory for every int you use. Each allocation takes on the order of 100ns plus the overhead of garbage collection. Compare that with the single CPU cycle consumed by adding two primitive ints. Your java codes would crawl along if the designers of the language had decided otherwise. Commented Aug 11, 2016 at 8:34
  • 1
    @cmaster: Scala doesn't have primitives, and its numeric performance is exactly the same as Java's. Because, well, it compiles integers into JVM primitive ints, so they perform exactly the same. (Scala-native compiles them into primitive machine registers, Scala.js compiles them into primitive ECMAScript Numbers.) Ruby doesn't have primitives, but YARV and Rubinius compile integers into primitive machine integers, JRuby compiles them into JVM primitive longs. Pretty much every Lisp, Smalltalk, or Ruby implementation uses primitives in the VM. That's where performance optimizations … Commented Aug 11, 2016 at 9:07
  • 1
    … belong: in the compiler, not the language. Commented Aug 11, 2016 at 9:08
2

Joshua Bloch in “Effective Java” recommends designing explicitly for inheritance or prohibiting it. Primitive classes are not designed for inheritance because they are designed to be immutable and allowing inheritance could change that in subclasses, thus break Liskov principle and it would be a source of many bugs.

Anyways, why is this a hacky workaround? You should really prefer composition over inheritance. If the reason is performance than you have a point and the answer to your question is that it is not possible to put all features in Java because it takes time to analyze all different aspects of adding a feature. For example Java didn't have Generics before 1.5.

If you have a lot of patience then you are lucky because there is a plan to add value classes to Java which will allow you to create your value classes which will help you increase the performance and in the same time it will give you more flexibility.

0
2

At the abstract level, you can include anything you want in a language you're designing.

At the implementation level, it's inevitable that some of those things will simpler to implement, some will be complicated, some can be made fast, some are bound to be slower, and so on. To account for this, designers often have to make hard decisions and compromises.

At the implementation level, one of the fastest ways we have come up for accessing a variable is finding out its address and loading the contents of that address. There are specific instructions in most CPUs for loading data from addresses and those instructions usually need to know how many bytes they need to load (one, two, four, eight, etc) and where to put the data they load (single register, register pair, extended register, other memory, etc). By knowing the size of a variable, the compiler can know exactly which instruction to emit for usages of that variable. By not knowing the size of a variable, the compiler would need to resort to something more complicated and probably slower.

At the abstract level, the point of subtyping is to be able to use instances of one type where an equal or more general type is expected. In other words, code can be written that expects an object of a particular type or anything more derived, without knowing ahead of time what exactly this would be. And clearly, as more derived types can add more data members, a derived type does not necessarily have the same memory requirements as its base types.

At the implementation level, there's no simple way for a variable of a predetermined size to hold an instance of unknown size and be accessed in a way you'd normally call efficient. But there is a way to move things around a little and use a variable not to store the object, but to identify the object and let that object be stored somewhere else. That way is a reference (e.g. a memory address) -- an extra level of indirection that ensures that a variable only needs to hold some kind of fixed-size information, as long as we can find the object through that information. To achieve that, we just need to load the address (fixed-size) and then we can work as usual using those offsets of the object that we know are valid, even if that object has more data at offsets we don't know. We can do that because we don't concern ourselves with its storage requirements when accessing it anymore.

At the abstract level, this method allows you to store a (reference to a) string into an object variable without losing the information that makes it a string. It's fine for all types to work like this and you might also say it's elegant in many respects.

Still, at the implementation level, the extra level of indirection involves more instructions and on most architectures it makes each access to the object somewhat slower. You can allow the compiler to squeeze more performance out of a program if you include in your language some commonly used types that don't have that extra level of indirection (the reference). But by removing that level of indirection, the compiler cannot allow you to subtype in a memory safe way anymore. That's because if you add more data members to your type and you assign to a more general type, any extra data members that don't fit in the space allocated for the target variable will be sliced away.

1

In general

If a class is abstract (metaphor: a box with hole(s)), it's OK (even required to have something usable !) to "fill the hole(s)", that's why we subclass abstract classes.

If a class is concrete (metaphor: a box full), it's not OK to alter the existing because if it's full, it's full. We have no room to add something more inside the box, that's why we shouldn't subclass concrete classes.

With primitives

Primitives are concrete classes by design. They represent something that is well-known, fully definite (I've never seen a primitive type with something abstract, otherwise it's not a primitive anymore) and widely used through the system. Allowing to subclass a primitive type and provide your own implementation to others that rely on the designed behaviour of primitives can cause a lot of side-effects and huge damages !

2
1

Usually inheritance is not the semantics you want, because you can't substitute your special type anywhere a primitive is expected. To borrow from your example, a Quantity + Index makes no sense semantically, so an inheritance relationship is the wrong relationship.

However, several languages have the concept of a value type that does express the kind of relationship you are describing. Scala is one example. A value type uses a primitive as the underlying representation, but has a different class identity and operations on the outside. That has the effect of extending a primitive type, but it's more of a composition instead of an inheritance relationship.

Not the answer you're looking for? Browse other questions tagged or ask your own question.