6
$\begingroup$

In a structurally-typed language, one type is a subtype of another iff it has all of the fields/methods of the other with compatible types, meaning that field or return types are subtypes, parameter types are supertypes, arities match, and so on recursively for all the types mentioned. Every object or record defines an exact type for itself, and type declarations set out the attributes that any member of that type must provide.

The formal description of the system is pretty straightforward, but how do you actually implement that check in practice when types can have deep recursive references to themselves and other types? There is no predeclared definitive answer like in a nominal system, so it needs to be figured out each time, but figuring out whether type A { id -> A } subtypes type B { id -> B } seems to require knowing whether A is a subtype of B already!

How does the compiler resolve these questions? That is, given an assignment of a value of a arbitrary structural type X to a variable of type Y, how does a practical typechecker decide whether it is allowed or not?

$\endgroup$
6
  • 1
    $\begingroup$ Sum types also can be subtypes of each other (interestingly, in an inverse to product types, one type is a subtype of the other iff the other type has all the variants of the first). This doesn't seem to occur much in practice. $\endgroup$
    – apropos
    Commented Aug 30, 2023 at 23:50
  • 2
    $\begingroup$ Yeah, I'm trying to confine this question to the smallest range I can for the moment so that answers can be well-scoped. $\endgroup$
    – Michael Homer
    Commented Aug 30, 2023 at 23:56
  • 1
    $\begingroup$ @apropos I don't know about you, but as a regular Typescript user, I find structural subtyping relations between sum types useful quite often. A typical example comes from compiler development itself; you might have a sum type representing various kinds of expressions, and a subtype for the kinds which are valid assignment targets (lvalues). But more generally, control-flow type narrowing of sum types (e.g. if you check for one variant in one branch, then you can eliminate that variant from the type in the other branch) depends on the narrowed version of the type being a subtype. $\endgroup$
    – kaya3
    Commented Aug 31, 2023 at 0:31
  • 1
    $\begingroup$ And it's not as straightforward as you might think for nominal subtyping either. Unrestricted nominal subtyping in a language with contravariant generic conversions (C#, Java...) is undecidable! Turns out you can encode a Turing machine into your interface definitions and make the compiler simulate the machine to compute whether one interface is convertible to another. $\endgroup$ Commented Sep 6, 2023 at 21:21
  • 1
    $\begingroup$ See cis.upenn.edu/~bcpierce/papers/variance.pdf which was written when it was still an open question as to whether it was undecidable. $\endgroup$ Commented Sep 6, 2023 at 21:22

1 Answer 1

5
$\begingroup$

There is a straightforward but suboptimal approach that is suitable when there are relatively few types, and is fairly easy to implement. It involves an $N \times N$ matrix of presumed subtyping relationships that are iteratively falsified. At the end, there's a lookup table of every possible pair of types and whether they're compatible or not.


First, collect all the types that appear in the program: this means type declarations, anonymous types, object or record construction sites, and anywhere else that creates a new type. Some of them may be duplicates, but that's ok. Give each of them a number $1..N$.

Create a two-dimensional array of booleans with $N$ rows and $N$ columns. Each cell will represent whether the type of that row is a subtype of the type of that column.

To start with, fill the entire array with trues. We're going to begin by presuming that every type is compatible with every other type, and then repeatedly check each pair of types to find reasons to falsify the assumption until we run out of mismatches.

For each cell that is currently true, call the column type X and the row type Y. Loop over every member of type X, and

  • Check whether a member of the same name is in Y. If not, we know that this is not a subtype, so put a false in this cell and move on to the next cell.
  • Check whether the number of parameters matches in both members (if applicable). If not, put a false in this cell, and move on to the next one.
  • Call the return type (or field type) of X's member P and the return type of Y's member C. Look up the cell in the row of C and the column of P: if it's false, we know that this can't be a subtype either, so put a false in this cell and move on. If it's true, we still think that might be a subtype, so continue.
  • Do the same thing for the parameters of the method, if applicable, swapping the row and column you look up.

If nothing has proved that Y is not a subtype of X, the cell will still be true. Do the same for every cell of the array, and then start again from the beginning: there will be more false cells this time and so fewer to look at, and some of the return/parameter types that were presumed compatible last time won't be any more, so there will be more subtype relationships falsified each time. As soon as you find out that a type can't be a subtype of the other, you stop looking at it further and never need to come back to it.

Repeat until you make it all the way through the entire grid without making any changes. At this point, every pair that has been proved not to be a subtype contains false, and the remaining true cells represent genuine subtyping relationships. This array can be used as a lookup table for subtyping wherever it's required in the compiler or interpreter, like in variable assignments.


This algorithm is guaranteed to complete and gives a sound subtyping, but it's doing a lot of work by examining every pair of types, and potentially every method of them, over and over again. For very large numbers of types and very large types this can be quite expensive, but it is cacheable and adding a new type requires only adding one row and column. There are some obvious optimisation steps (e.g. run only the cheap member/arity checks in the first pass), but beyond that more complex approaches are needed for meaningful savings. For a moderate number of types, this approach is fine for many practical use cases.

Any pair of types with no reason to believe that there isn't a subtype relationship will be left marked as true, correctly. In the A { id -> A } & B { id -> B } case from the question, nothing will have falsified this relationship so A and B will correctly be considered subtypes of each other (synonyms). If A had additional members, it would still be considered a subtype of B, but not the reverse because a member wouldn't have been found.

$\endgroup$
4
  • $\begingroup$ Curious, could you elaborate on the history (or lack thereof, if "folklore") of this algorithm? $\endgroup$
    – apropos
    Commented Aug 31, 2023 at 4:52
  • $\begingroup$ This is just the most naïve direct approach & it's somewhat worse than Cardelli's, let alone the more optimised approaches. I'm hoping someone with the experience will answer about the better ways. $\endgroup$
    – Michael Homer
    Commented Aug 31, 2023 at 6:43
  • $\begingroup$ I haven't implemented it yet, but I intend to look into exposing the structural plane explicitly in a future version of Tyr. As Tyr incorporates many paradigms, going with an O(n²) mem solution will not work. I guess the solution will be to use a canonical memory layouting algorithm and do a string prefix comparison. And recursive decent into types like functions. Ask again in ten years or so ;) $\endgroup$
    – feldentm
    Commented Sep 1, 2023 at 8:50
  • $\begingroup$ @feldentm Yes, type canonicalisation is a key part of the more-efficient algorithms, though they also take a pretty different approach to this overall. A good overview of the algorithmic situation a decade ago is here (I'm not sure whether/where it ended up in archival publication). $\endgroup$
    – Michael Homer
    Commented Sep 1, 2023 at 9:40

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .