27
$\begingroup$

Unit types are types found in more modern languages, replacing void in the C-family languages. Unlike void, a unit type has an actual value, exactly one. This allows functions that return nothing to be used in expressions and the like. Initially, my language had a Void type, which simply served the purpose of void in the C-likes (I thought a unit type would be too hard to implement for the time being). However, as my compiler grew, Void slowly gained more and more "realness" until I got to the point where it was just below a "real" type. All that was needed was a few more lines of code and it'd be a full-blown unit type. What are the advantages of me turning it into a unit type versus just having an equivalent of C's void?

$\endgroup$
7
  • 5
    $\begingroup$ I don't think the two are equivalent, in fact the equivalent of void should be Unit | Never $\endgroup$
    – Aster
    Commented Jul 10, 2023 at 3:54
  • 5
    $\begingroup$ The void type in C was a design error, retained for historical reasons. Don't emulate it. (It seems like all the actual answers support this.) $\endgroup$ Commented Jul 10, 2023 at 7:18
  • 7
    $\begingroup$ @Aster well, then you could say the same about all types. A C function whose result is an int is also not guaranteed to terminate; yet it's generally understood that this has nothing to do with the int type having an extra value Never, but simply that it's always possible for a function to not yield a value at all in finite time. By contrast, Haskell does consider nontermination as a special (un-matchable) value which every type has in addition to its "proper" values. In that sense Haskell's unit type () has indeed two values, () and . $\endgroup$ Commented Jul 10, 2023 at 16:14
  • 1
    $\begingroup$ And yet in Racket (void) evaluates to the singleton value #<void> $\endgroup$ Commented Jul 11, 2023 at 20:03
  • 1
    $\begingroup$ " This allows functions that return nothing to be used in expressions and the like" C++ allows this and its void is certainly "C-like" $\endgroup$
    – Ben Voigt
    Commented Jul 11, 2023 at 21:25

8 Answers 8

31
$\begingroup$

In C, void is used in four and a half different ways. A unit type is equally good for one, worse for a second, better for two more, and totally unsuitable for the last.

Some languages have a Top or Bottom type, both distinct from the unit type, which is a better fit for some of the ways C uses void.

Side-Effect Functions

Marking the return type of a side-effect function is completely equivalent to a unit type in other languages. There are no advantages one way or the other; it’s just spelled differently. An unusual variation is Haskell: since there is no such thing as a side-effect function that’s pure, a pure function returning a unit type would be completely useless in a functional language—but many Haskell functions return io (), which actually represents an operation evaluating to () that can be run.

Throwing Away Return Values

The second distinct use for void in C is that the cast (void) explicitly declares that you are ignoring a return value. This can be necessary to make all the branches of a ? expression have the same type.

In other contexts, C spells this cast ;. This is partly because K&R-style functions needed to return something, so a function that would return void in ANSI C returned int in K&R C, and the programmer would just ignore the meaningless return value. You will sometimes. however, see C programmers write it out as (void)printf("hello, world!");. In fact, lint used to require it. If your function is returning something so useless that it’s better to ignore it than to check it, that’s a code smell, but Dennis Ritchie would have copped to that.

The reason for this is that many C functions report errors in their return values, and the language makes it easy to shoot yourself in the foot by ignoring errors. If programmers are forced to explicitly write out that they are ignoring the return value, they at least need to think about it, and are less likely to neglect to check for an error just because the language makes it easier to ignore them.

A modern strongly-typed language returns different types on success and error, like Just x/Nothing in Haskell, or Ok ()/Err _ in Rust. In practice, an option unit type is equivalent to a boolean, but might appear as a stage of a railway-oriented computation terminating in (). A unit type is useful to compose a result type that reports either the fact of success or details about the error. The void type in C or C++ cannot be used this way: a std::variant or union cannot contain void.

The language might or might not make it syntactically possible to ignore errors at all. One solution would be to implicitly unwrap values, so that if (in Rust syntax) foo returns a Result<(),Error>, foo(bar); would implicitly check the return value. If it’s Ok (), the program proceeds, and if it’s an error, the program panics. A language that did this would probably have a lot of unexpected runtime panics, but at least there would be a message giving the line of code where the error appeared and a useful stack trace. Rust instead takes a middle ground (based on Swift) where unwrapping must be explicit, but there’s the syntax sugar ? for unwrapping on success and short-circuiting on error.

Some languages use a return value of the bottom type (the type that no other is a subtype of) to denote the return value of an expression that never returns at all, such as throw. This is very different from a function that always returns successfully! A special case is a panic function that needs to be able to appear in any context. In many languages, these functions that terminate the program are the one exception to the type system, since there should be no type-checking of its return value (or it can be thought of as having a fully-generic return type). C has syntax to make all branches of a ? expression return (void), but not to ignore the type only of branches that do not return, and unify the type of all the ones that could.

A Generic Reference

C uses void* as the type of a generic object pointer. A unit type would be a poor fit for this. A top type from computer science (like Object or Any) would be more appropriate, if a language has such a thing at all.

C needed a type like void* because of some quirks required for compatibility. A new language would not have the same requirements. One is that K&R-style functions don’t type-check their arguments, so it was common to write code like memcpy(&dest, &src, sizeof(dest));. When C++ and ANSI wanted to add type-checking, they therefore needed a type that any other object pointer implicitly converts to. Another quirk is that C historically had to support mainframes that used different pointer formats for character and word pointers, or even had pointer types of different sizes. K&R historically reads these out as char*, and mainframes with different word and character pointers historically made character pointers long enough to hold word pointers but not vice versa. Standard C and C++ therefore specified that it must be safe to convert any object type to either void* or char*. and furthermore that a function call with K&R-style promotions must use the same representation for a conversion to void* or char*. Finally, C keeps backward-compatibility with code like char* foo; foo = malloc(SIZE);, and in fact now lets you write T* const foo = calloc(elems, sizeof(T));, but this was a bridge too far for C++, which wanted programmers to stop using C-style memory management anyway. (C++ then introduced a new unit type, nullptr_t, to restore this implicit conversion to, and not just from, any pointer type. There had previously been a widely-used implementation by Scott Myers, in Effective C++.)

One disadvantage of a reference to a unit type is that the compiler will catch the logic error of dereferencing a void*. If you tried to represent something similar using a reference to a unit type, in a language with automatic type inference, it would dereference to a valid object.

Another is that a generic object pointer must be at least as wide as any object pointer, plus whatever bits are needed to distinguish between different pointer types. It would me more efficient for all unit-type references to have zero bits of storage and compare equal to each other.

If you allow objects or references to the unit type to exist, there would also be no way to distinguish between their type and the fully-generic pointer type.

For most purposes that you would use a void* that isn’t NULL, it would be better to use the concept of a top type, one that every other type is a subtype of. That would represent a reference that could be to any type, in a more rigorous way. A weakly-typed language (one that only distinguishes between returning nothing or returning something, or one that has everything return some possibly-empty string) implicitly has a top type that it uses to represent some arbitrary untyped thing. It might or might not have a name in the language itself.

Empty Argument Lists

The only reason this is a homonym for the unit type in C was that the more-intuitive syntax f(), as in most other C-family languages, already meant something different. C does not use void to define an empty struct, but some C++ templates do.

Some other answers discuss whether, in a language with template or generic type parameters, void should be a valid type. This essentially means that everything you specify as “of one generic parameter” really means “of zero or one parameter,” and everything you specify as N generic parameters really means “of up to N parameters.” A language with generics, but not overloading, does need some construct that means, “any number of items,” “zero or one item,” and “exactly one item.” These might be type qualifiers.

There are several advantages to representing a generic set of function parameters as a tuple, with its syntactic sugar, rather than with some other syntax like a C++ T... parameter pack. In terms of semantics, and not just spelling, a unit type has the advantage that the logic of many operations that would fail for void work for (). For example, you can call a function with the members of () as its arguments, or iterate over the members of (), but not pass in the value of a void.

It is possible to write around the quirks of void in C++, for instance by specializing the template, using if constexpr (std::same_as<T,void>) to handle the special case, or by defining a concept notvoid that lets you write template <notvoid T>. In a language like Rust or Haskell, where generic code needs to specify at least one typeclass to be able to do anything with a generic parameter, void would be excluded as a possible type naturally.

The Type of a Null Pointer (Possibly)

C has never said that the type of NULL must be void*—and some compilers resolve it to a magic keyword instead. But ((void*)0) is the most common definition in C (not C++).

First, modern languages usually don’t have null pointers at all. C.A.R. Hoare, who invented the concept, called it his “billion-dollar mistake.” It’s a lot more than that today. Modern languages typically give success and error values different types, so that attempting to use an error code as a reference becomes a type error.

A possibly-invalid reference should be an option type. A constant representing an invalid machine address should be a pointer-sized integer. A valid reference to no valid object might be a reference to the bottom type. You shouldn’t attempt to shoehorn a reference to a unit type into this role just because C does something similar, any more than you should assume that in and on are the same in English because some other languages uses the same word for both, or that some other language has one word for you.

$\endgroup$
10
  • 1
    $\begingroup$ C++11 (I think) replaced the NULL macro with a nullptr keyword, with the type nullptr_t, which is convertible to, but distinct from, void*. $\endgroup$
    – dan04
    Commented Jul 10, 2023 at 16:13
  • 1
    $\begingroup$ @dan04 Even before that, GCC 4.9.2 had introduced __null as a magic keyword in G++, and Scott Meyers had written a widely-cited nullptr_t template in his book Effective C++. However, I see that GCC on many platforms does still define NULL as ((void*)0). $\endgroup$
    – Davislor
    Commented Jul 10, 2023 at 16:34
  • 1
    $\begingroup$ Nullability, optionality, whatever you name it. As long as it is not the default and universally applicable (including stackable), it's fine. $\endgroup$ Commented Jul 11, 2023 at 8:39
  • 2
    $\begingroup$ Tiny nitpick, while Hoare did invent nulls in the '60s, he didn't say that line until 2009 $\endgroup$
    – AakashM
    Commented Jul 11, 2023 at 9:30
  • 2
    $\begingroup$ @AakashM While I liked the joke, I don’t want to spread any misinformation. So, removed. $\endgroup$
    – Davislor
    Commented Jul 11, 2023 at 14:21
22
$\begingroup$

An advantage of having a unit type instead of a void type is: unit is a regular type that functions normally, whereas void is weird and creates special cases.

Consider C#: they have void instead of unit, and when you want to write a wrapper for a function that returns a value, you have to write it twice:

    // normal function
    T Wrap<T>(Func<T> f)
    {
        Console.WriteLine("Calling the wrapped function...");
        return f();
    }
    // special case to handle void
    void Wrap(Action f)
    {
        Console.WriteLine("Calling the wrapped function...");
        f();
    }

If C# had chosen unit instead of void, you would simply call Wrap<Unit>() with a Func<Unit>. void and also Action would become unneeded.

$\endgroup$
5
  • 3
    $\begingroup$ Is this a duplicate of my answer? $\endgroup$
    – Bbrk24
    Commented Jul 9, 2023 at 21:51
  • 1
    $\begingroup$ They both kinda make the same point, yes. I guess you win because you posted 3 minutes before me. But I think my example is clearer. $\endgroup$ Commented Jul 9, 2023 at 21:55
  • 3
    $\begingroup$ As Java successfully used Void in its generics I'm surprised that C# made it worse. $\endgroup$
    – OrangeDog
    Commented Jul 10, 2023 at 7:53
  • 3
    $\begingroup$ I believe in C++ it's legal to write return f(); if f() is template-dependent and is of type void when the templates are filled in. $\endgroup$ Commented Jul 11, 2023 at 3:47
  • 3
    $\begingroup$ It doesn't need to involve templates. return f(); where f() returns void is perfectly legal C++. The same holds for C. I am not sure why C# decided to make it more cumbersome in this aspect. See the answer by @abel1502 for details. $\endgroup$ Commented Jul 11, 2023 at 4:53
9
$\begingroup$

Type Theory

It's never a bad idea to start with some theory. In programming language theory, a type is akin to the set of values that instances of the type can take.

Some types are a finite set: a boolean has 2 values, a 32-bits integer a little over 4 billions. Others are an infinite set: strings, lists, etc...

Empty and Unit are very different there:

  • Empty: no value, the empty set.
  • Unit: one value, a singleton set.

Unit types and Empty types in the wild

First of all, unit types just exist. Even in C!

A struct with no field, or similarly a tuple or any other product type with no field, is a unit type. An array of 0 elements is a unit type. You can create a value, and all values are equivalent.

Similarly, empty types just exist. Even in C!

A union with no field, or any other sum type with no alternative, is an empty type. You cannot create a value.

Unless you go out of your way to forbid them in a language, they emerge spontaneously.

No value?

This is correct, an Empty type cannot be instantiated.

This has several implications:

  • A function taking an Empty type as argument can never be called.
  • A function returning an Empty type... actually will never return.
  • A generic sum type with an Empty type in some alternative will never have an instance of that alternative.

Empty types are very useful markers, allowing to cull away execution paths that cannot exist. This is useful for communicating intent to other humans and tools alike.

For example, let's have a look at the definition of abort in C++ and Rust:

[[noreturn]] void abort();
fn abort() -> !;

C++ requires a dedicated attribute, which is only usable in return position, and doesn't mesh well with meta-programming.

Rust, on the other hand, treats ! as just another type1. It can be used everywhere a type is expected, it's accessible via meta-programming, etc...

1 Ahem, well, it's not quite just another type yet.

What of C's void?

C's void is a strange beast, behaving at times like a unit type and at times like an empty type.

When denoting the parameters or return type of a function, it doesn't denote a function that cannot be called, or will never return, so is somewhat like a unit type -- except there is not really any value.

Still, it's a bit weird:

void foo(int);    //  One argument.
void bar(void);   //  No argument, of course!

Yet, no field of type void may exist in a struct or union, and no variable of type void may be declared. The latter, notably, means that direct chaining of return values is possible, but attempting to do so in two steps is not:

void foo();

void ok() {
    return foo();
}

void error() {
    void v = foo();
    return v;
}

Code generation, whether with external tools or macros, and in C++ templates, particularly suffer from this "special case".

Uniformity, Regularity

It's easier to shun C's void, and properly use unit and empty types instead.

It's easier both for the compiler writer and for the user, because it eliminates special cases!

From the user end, more specifically:

  • The less "special cases", the easier it is to learn a language.
  • The less "special cases", the more "brain cells" available for actual problem solving.
  • The less "special cases", the less issues in generic code.

The only cons, really, is unfamiliarity for people who have no been exposed to the concept previously... and they'll be converted quickly once they realize how much it simplifies their life.

Regrets

I remember reading that Anders Hejlsberg, the creator of C#, infamously regretted the irregularity of void in C# due the difficulties this introduced in generic code.

I cannot find a citation, but I can point at TypeScript -- his next brainchild -- where he kept void, but made it more regular instead. You can have variables of type void in TypeScript, though they can only be of value undefined.

Still named void, but not quite like C's void.

$\endgroup$
11
  • 1
    $\begingroup$ "A function returning a Void type... actually will never return." Why couldn't it just be unusable in expressions, the way C's actual void actually works? Why does the lack of a return value prevent returning, if the code is not permitted to use the missing return value? $\endgroup$ Commented Jul 11, 2023 at 2:11
  • 1
    $\begingroup$ @KarlKnechtel: To return a value, you must create it. A value of a Void type cannot be created, hence you cannot call return with it. $\endgroup$ Commented Jul 11, 2023 at 6:46
  • 1
    $\begingroup$ @Seggan-OnStrike: Your question asks about unit vs c-void. This answer explains what unit types and empty types are, which should make it evident that c-void is a weird mix of both, sometimes behaving like a unit type (you can return it) sometimes like an empty type (you can't create a value of it, nor a variable with its type). If it's not as clear as I hoped, let me know what you're missing and I'll try to clarify it. $\endgroup$ Commented Jul 11, 2023 at 6:52
  • 1
    $\begingroup$ @MatthieuM. That isn't the question. Why does returning from a function entail returning a value? $\endgroup$ Commented Jul 11, 2023 at 14:07
  • 2
    $\begingroup$ @KarlKnechtel: It doesn't entail it, but it creates a corner case -- a difference between functions that return values vs functions that don't -- which makes the life of users more complicated in "generic" code, be it code generators, macros, templates, generics, ... $\endgroup$ Commented Jul 11, 2023 at 14:11
8
$\begingroup$

Generic callback arguments

Consider a function with a generic type argument, such as the resolve callback to a TS Promise:

let x = new Promise<T>((resolve, reject) => {
    // resolve has type (T) => void
});

If void is not usable in generic types, as in Java and C#, these have to be special-cased. And indeed they are: C# has different types Task and Task<TResult>. Having a first-class unit type makes library code like this simpler and less redundant.

On the other hand, if (as in C) an argument list of (void) means "no arguments", this becomes easier on the caller. To call a function of type (Void) -> Void in Swift requires an extra pair of parens, foo(()), which just serves as visual noise.

$\endgroup$
6
$\begingroup$

I'd say there is no con to having a unit type. Even C++, while inheriting void from C has a unit type standardized in the form of std::monostate. Moreover, even with void, C and C++ allow constructs like:

void f() {}

void g() { return f(); }

So really, I'd say void type is merely a slightly restricted version of unit type. If your language has some custom weird behaviours related to void (like forbidding its use in structures, perhaps), you may, in the worst case, extend them to the unit type too. Regardless, the users will probably appreciate the increased generality.

$\endgroup$
1
  • 7
    $\begingroup$ Though C++'s void gets hairy when you try to assign the result of a void function to a local variable temporarily. It's really easy to make a template function accidentally stop working when T = void by changing something trivial like that; trust me, I've been there. $\endgroup$ Commented Jul 9, 2023 at 23:14
5
$\begingroup$

It could make sets a special case of maps. That is, std::map<T, void> works very like std::set<T>. In a theoretical new language we could make the way we use both of them identical.

I don't know a case that makes it obviously useful except for writing less code implementing both of them, though.

$\endgroup$
3
  • 1
    $\begingroup$ That is an example of a unit type though. That it works in C++ is because C++ special-cases this particular use. $\endgroup$
    – Longinus
    Commented Jul 10, 2023 at 16:09
  • 3
    $\begingroup$ @Longinus I don't think it works in C++. At least it doesn't compile in g++. The point is if void was a unit type, this could easily be supported, as something in the pros list. (Also, the downvoter has reversed their vote after my edit.) $\endgroup$
    – user23013
    Commented Jul 10, 2023 at 16:22
  • 1
    $\begingroup$ Ah I see, I misunderstood what you wrote. (Also why did I believe C++ allowed that? I don't know) $\endgroup$
    – Longinus
    Commented Jul 10, 2023 at 16:51
3
$\begingroup$

In some contexts, void means that the function might have a return value, but the caller isn't supposed to use it. Consider this in Typescript:

function doSomethingAndThen(callback: () => void) {
    doSomething();
    callback();
}

In Typescript, it's legal to provide a callback which does return something, e.g. () => someArray.pop(), because () => T is a subtype of () => void for all T. This is convenient for the programmer, to not have to write the more verbose () => { someArray.pop(); }. It also allows things like doSomethingAndThen(somethingElse), where somethingElse is a named function which already exists but has a return type.

This would be unsound if void were supposed to be a unit type like undefined (that is, the unit type whose only member is the value undefined), because the expression callback() would have type undefined despite not necessarily having the value undefined.

So, when void appears in return position like () => void, it is more analogous to a top type like unknown than a unit type. On the other hand, it's more useful to annotate functions as returning void rather than unknown, since the former indicates that even if some value is returned, it is meaningless and shouldn't be used.

$\endgroup$
-2
$\begingroup$

Disadvantage: the language lets you do more silly things. Everywhere there isn't a value, you can treat it as a value and you get some nonsense like unit or (). For example, Haskell is happy to let you store () in a variable and print it.

All else being equal, a language that prevents more mistakes is a better language.

$\endgroup$
8
  • 4
    $\begingroup$ I don't think this is a disadvantage. Being able to generalise functions that expect to store something, so that they don't actually store anything useful, is a good feature $\endgroup$
    – pxeger
    Commented Jul 11, 2023 at 14:08
  • 2
    $\begingroup$ Why is it "silly" or "a mistake" to do so? $\endgroup$ Commented Jul 11, 2023 at 14:08
  • 2
    $\begingroup$ @KarlKnechtel What do you expect print(print(42)) to do? I think if you interview 100 programmers who don't already know your language or similar ones, they probably didn't say 42unit or 42() $\endgroup$ Commented Jul 11, 2023 at 19:56
  • 1
    $\begingroup$ That still doesn't explain why it's "silly" or a "mistake" $\endgroup$
    – Seggan
    Commented Jul 24, 2023 at 14:27
  • 1
    $\begingroup$ @Seggan-OnStrike I shouldn't have to explain why print(print(42)) is "silly" or a "mistake" - it should be readily apparent! $\endgroup$ Commented Jul 24, 2023 at 19:10

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .