15
$\begingroup$

Type inference in a Damas-Hindley-Milner type system with subtyping is known to be undecidable in general.

What makes it so difficult? What options are there to get around undecidability which could make it work? For example, what are some "minimal" type annotations that would be necessary?

$\endgroup$
2

1 Answer 1

8
$\begingroup$

Type inference in a Damas-Hindley-Milner type system with subtyping is known to be undecidable in general.

What makes it so difficult?

You can have type inference for subtypes, but it’s undecidable to preserve all of the nice properties that make the algorithm W by Damas & Milner so noteworthy:

  1. (Nice) Principal Types
  2. No (Artificial) Annotations
  3. Only Unification

Subtyping interacts with all three. I can’t give a full exposition of all the issues, but I will give an overview of the specific challenges involved in this design space.

1. (Nice) Principal Types

The most important feature of the Damas–Hindley–Milner system, and the feature that I’ll spend the most time on here, is that it has principal type inference. In addition, the inferred types are very compact and easy to read.

Principal type inference means:

  1. The system has principal types.

    If a term is typeable in a certain environment, then there is a generic type scheme that can be instantiated to give any valid type for that term in that environment.

  2. There is a procedure (W) for finding the principal type.

Subtype systems may not have principal types: a term can have multiple valid types, none of which is more general than the others; and which are all maxima, that is, not instances of a more general type.

1.2. Decidable Subtyping

The way Damas & Milner achieve principality is essentially by restricting the subtyping relationship to a decidable one. There is a sort of subtype lattice, but it has only two levels:

  1. A universally quantified type variable is a least specific element

  2. Any concrete type is a most specific one.

(You can also stand on your head and call these “most general” and “least general” if you like—as long as you always remember which way is up!)

Because of these restricted levels, any bound on a type tells us a lot! Either we know exactly what it is, or we know that it must be polymorphic.

In the “less specific than” view of the lattice, any concrete types have a meet/min/lower-bound/intersection, namely a type variable; and no concrete types have a join/max/upper-bound/union. So, if you use a term at multiple types in the same context, then it must be fully polymorphic, or else it’s invalid; and you have to use a term consistently in different branches.

This rests on the fact that a universal “forall” quantifier is the same as an infinite intersection over all possible choices of type, that is:

 identity : ∀a. aa

 =

 identity :
   (unit → unit)
  ∧ (bool → bool)
  ∧ (int → int)
  ∧ (string → string)
  ∧ …

This is really what lets them compress a big gnarly type into a small one: any valid set of constraints can be simplified.

This kinship between universals and intersections means that a lot of results are related between the two. For example, intersection and union subtyping need restrictions in the presence of effects, just like ML’s “value restriction” for universals. And rank-2 universal quantification—where (∀x. …) only appears nested on the left of at most 1 function arrow—has decidable type inference, as do rank-2 intersection types. (See Jim 1996) for more. And while we’re here…

★ Tangent! Types vs. Typings

A principal type is not the same as a principal typing, although the terms are sometimes mixed up.

A principal type is the most general type you can give to a term in a given environment. A principal typing is a pair of the least constrained environment where that term is typeable, and the principal type in that environment.

Simply typed lambda calculus, without named recursive definitions, has principal typings, but Damas & Milner’s system doesn’t. For an example, consider:

let loop () = loop () in
  (loop ()) (loop ())

Since this is a nonterminating computation, it can be given any result type, as in:

  • S = ∀a. unit → aa
  • T = ∀b. unit → b

And notice that S ≤ T. So these typings are both valid:

  • S ⊢ S

    loop : S ⊢ (loop ()) (loop ()) : S

  • T ⊢ T

    loop : T ⊢ (loop ()) (loop ()) : T

However, neither is more general than the other: (A ⊢ B) ≤ (C ⊢ D) if C ≤ A (contravariant) and B ≤ D (covariant), but (S ⊢ S) ≰ (T ⊢ T) and (T ⊢ T) ≰ (S ⊢ S) because we run into T ≰ S on each side.

1.3. Nominal Subtyping

In a general subtyping lattice, we may have upper & lower bounds that don’t give us equalities straight away. In a nominal type system, or when dealing with generic type variables, there isn’t necessarily any named type corresponding to the meet or join of any two types. As a result, we may need to keep intersections and unions around to solve later, if and when we get enough information to simplify them.

In the end, when we’ve accumulated all of the constraints we can deduce from the program, the solution isn’t necessarily unique.

We have a set of bounds, which we can check for consistency, at least. For example, if we only allow intersections and not unions, then we can report a type mismatch if there are multiple disjoint bounds. But with union types, even disjoint sets of constraints can be satisfied.

This comes down to a design question—you need to be very careful when extending the set of typeable valid programs, while still rejecting invalid programs, but how to do so is an art, not a single principle.

1.4. Structural Subtyping

You can do a little better here with structural/record types, because you may be able to simplify constraints earlier. You can construct an intersection A ∧ B directly, as the union of the properties of A and B.

This is part of how algebraic subtyping works. However, to achieve good inference, it gives up some opportunities for more expressiveness. For example, it relies on a polarity restriction: union types can’t appear in negative positions (inputs), and intersection types can’t appear in positive positions (outputs). So it can infer a function type such as A → (B ∨ C), but you can’t write a function of type (B ∨ C) → D and compose the two directly—instead, you would need to treat the cases B → D and C → D separately.

An understanding of polarisation in logic, and type inference as a kind of proof search, is helpful in coming up with ways to relax these restrictions. So, to your question:

For example, what are some "minimal" type annotations that would be necessary?

One way of allowing union types as inputs would be to allow “shift” annotations, whereby the programmer explicitly says where they want to switch polarity, to guide the inference process.

2. No (Artificial) Annotations

The second important property of algorithm W is that it needs no artificial annotations. Every definition’s type is inferrable without a type signature, and the only annotation that polymorphism needs is a very natural one, using a variable binding let x = e1 in e2 instead of a function application (λx. e2) e1.

This again is a matter of design and praxis. The design of Haskell has been strongly informed by the goal of always being able to infer the types of programs in the basic language, when they’re unambiguous. But it’s also well established at this point that for any “fancy” extensions, type inference is a best effort, and at least some top-level annotations may be required, and this isn’t a burden—the sense is that the compiler should be allowed to ask for an annotation, if it has a good reason.

This also informs the design of Rust, which requires signatures on top-level definitions—indeed it doesn’t really have syntax for definitions without signatures.

3. Only Unification

Finally, W relies only on basic unification of equality constraints, which is decidable and efficient in practice. Upper and lower bounds are inequality constraints, which requires us to use semi-unification.

For arbitrary programs, this is undecidable (Kfoury, Tiuryn, and Urzyczyn 1993). Then again, humans don’t write arbitrary programs. So even an algorithm that isn’t guaranteed to terminate, or has poor asymptotic performance, can still sometimes be useful and efficient in practice.

Still, your final point:

What options are there to get around undecidability which could make it work?

There has been research into restrictions that make this problem decidable while remaining expressive (see Leiß and Henglein 2005), such as:

  • Limiting to one inequality at a time (uniformity)
  • Limiting bounds to simple pattern matches (left-linearity)
  • Placing various restrictions on recursion (acyclicity)
  • Bounding the number of variables to be solved, or where they may appear (quasi-monadicity)
$\endgroup$
6
  • $\begingroup$ Re your tangent on typings: can you give me a pointer to where the syntax you used is explained? I'm confused by the "valid typings" that start with bold, contain and have two occurrences of . I'm familiar only with "context ⊢ expression : type". $\endgroup$
    – Bergi
    Commented Jul 5 at 21:40
  • $\begingroup$ @Bergi: That’s a bullet point where the stuff to the left of the dash is a label and the stuff to the right is the details. I’ll try to format it differently so it’s less weird-looking. $\endgroup$
    – Jon Purdy
    Commented Jul 7 at 2:07
  • $\begingroup$ Ok, I considered that, but then what does mean between two types? Also now that I'm looking closer, loop loop seems to be missing some unit arguments? $\endgroup$
    – Bergi
    Commented Jul 7 at 2:50
  • $\begingroup$ Good answer, fun tangential fact, nominal subtyping in languages that have generic types and contravariant conversions (that is, IComparable<Animal> is convertible to IComparable<Giraffe> is also undecidable unless you put certain restrictions on the kinds of types you can make bases of other types. $\endgroup$ Commented Jul 7 at 4:37
  • $\begingroup$ Fun fact about that tangential fact, it was this answer on stack exchange about how decidability of nominal subtyping was conjectured but not proven to be undecidable that inspired someone to come up with the proof. :) cstheory.stackexchange.com/questions/18846/… $\endgroup$ Commented Jul 7 at 4:48

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .