19
$\begingroup$

There seem to be two conflicting views regarding the status of "type systems" used in dynamically typed languages:

  1. That dynamically typed languages are actually just unityped static languages. From what I can tell, the majority of type theory literature for programming languages (e.g. Types and Programming Languages) focuses only on statically typed languages, where by definition type checking is performed at compile-time. From this viewpoint, it seems reasonable to treat dynamically typed languages as being unityped.

  2. On the other hand, the runtime engines of popular scripting languages like Python and JavaScript check the types of expressions during the program's execution, and raise an exception if there are any type errors. This feature seems to be common in dynamically typed languages, and is colloquially referred to as "strong typing" (although this term is discouraged). In the literature, it seems standard to refer to these runtime types as "type tags".

Even though dynamically typed languages don't have genuine type systems according to the definition in point #1, it seems like many ideas introduced in type theory (e.g. polymorphism, subtyping) are directly applicable to understanding how these dynamic languages behave at runtime. Conversely, it seems reasonable that we could run a statically typed language line-by-line (e.g. in an interpreter) and handle type checking in an identical way to the method described in point #2.

So, my specific questions are:

  1. How useful is type theory for understanding dynamically typed programming languages?

  2. Are there any formal theorems showing a direct correspondence between "static types" and "runtime type tags"? Can we just replace the word "type" with "type tag", and perform the type checking at runtime instead of ahead-of-time, and then think about the programming language in terms of static type theory? If so, are there any limitations to this approach, e.g. situations where the analogy between "static types" and "runtime type tags" is no longer valid?

$\endgroup$
3
  • 2
    $\begingroup$ Welcome! This is an interesting question, but to keep it focused enough for our format I suggest narrowing it down to just (1.) and (2.) ─ your (3.) is quite open-ended by itself. $\endgroup$
    – kaya3
    Commented Aug 14, 2023 at 13:08
  • 1
    $\begingroup$ Unlike in Python, it's very rare to get a type error in JavaScript because of implicit type conversion. "3" * 4 is 12 (instead of "3333"), "9"**"2" is 81, and "string" / [1, 2, 3] is NaN. The only ways I can think of to get a type error in JS are trying to access a property of something that isn't an object, and trying to call something that isn't a function :D $\endgroup$ Commented Aug 15, 2023 at 1:16
  • 1
    $\begingroup$ Thank you for the thoughtful answers everyone! I've accepted @Alexis King's answer because disentangling the terminology (as well as introducing the linked paper) was the most helpful for me. But I really appreciate everyone's contributions :) $\endgroup$ Commented Aug 17, 2023 at 14:12

4 Answers 4

16
$\begingroup$

You are talking about type systems, not type theory

I think it is worth drawing a distinction between type theory and the theory of type systems. Yes, this nomenclature is confusing, and yes, I realize this may seem like a pedantic quibble to many people. But they really are quite different:

  • Type theory is a branch of formal mathematical logic. It is a direct descendant of set theory, and although many working type theorists are computer scientists, many of them are also pure mathematicians and logicians.

  • The theory of type systems, usually shortened to just type systems, is a branch of computer science, and more specifically a branch of the theory of programming languages.

Of course, there are many close relationships between these two disciplines. They use similar notation, and they sometimes study similar things. Metatheoretical tools developed in type theory have been applied in the study of type systems and vice versa. However, type systems did not originate in type theory; to quote Hillel Wayne:

I regularly see the claim that "type theory invented type systems": that we have static typing because people tried to apply type theory to programming. This is untrue. The two concepts developed independently and were unified much later.

For more background, see In Search of Types (pdf).

The formal theory of type systems is a set of models

The formal study of type systems rarely involves studying “real” programming languages. Instead, it studies simplified “model languages” and simpler languages used as compiler intermediate representations. The aphorism “all models are wrong, some are useful” largely applies here: most of these models are tools of thought that we value because they provide some insight or predictive power.

Since type systems, formally, are models, it should come as no surprise that there are often several different legitimate models for the same phenomenon, and they offer different perspectives that can be useful for different things. Bob Harper’s somewhat tongue-in-cheek “unityping” perspective is correct in that it is a legitimate model of dynamically typed programming languages, and sometimes it is even a helpful one! But it is also quite far removed from the informal mental models used by most working programmers writing dynamically typed programming languages.

Gradual typing provides another perspective

The theory of gradual type systems provides a very different model from the “unityping” perspective, and I would argue it is much closer to programmers’ mental models. Gradually typed languages often work very hard to support dynamic typing idioms, which requires developing typing rules that capture the rules those idioms abide by.

Since these type systems are not always enforced, they definitely do not provide a perfect model of dynamically typed languages, but they do provide a very useful one.

The formal theory of type systems mostly does not study purely runtime typechecking

From the perspective of programming language theory, a “type system” is more or less fundamentally a static analysis. For this reason, the theoretical definition of what a “type system” is does not line up perfectly with the definition that would likely be used by the working programmer:

  • Many dynamically-typed languages have sophisticated runtime typechecking mechanisms, but a theoretical treatment would mostly consider these a part of the language’s operational semantics. Many of these systems would likely be more usefully modeled by the theory of contract systems.

  • Many programming language researchers would call many things “type systems” that programmers probably don’t think of as type systems. Many forms of static analysis can be usefully viewed in the framework of type systems, but they may not have all that much to do with programmers’ notion of types.

It is important to underscore that this is not a failure of programming language theory to care about “real” systems, it’s just that researchers use a wide set of tools to model and analyze programming languages, and sometimes substantially different tools are needed to most usefully analyze things that programmers would consider very similar. For this reason, researchers often build models out of several pieces and prove that the different pieces are somehow consistent with each other.

For example, a language like Java would be understood by a programming language researcher as having both a static type system and a dynamic type system. The static type system could be modeled using formal typing rules, and the dynamic type system could be modeled using some form of operational semantics. The researcher could then prove statements about how these two systems cooperate in a consistent way.

For a working programmer, this level of formality is likely not helpful. But understanding the structures in an informal way can still be useful for understanding how programming languages really work. Ultimately, there is no one answer here, since again, many different models can be applied to the same programming language, but that’s all the more reason to learn several different perspectives and adopt whichever one happens to be most useful.

$\endgroup$
0
9
$\begingroup$

How useful is type theory for understanding dynamically typed programming languages?

I would argue that type theory is important to the way that experienced programmers think about code, even when they are writing in a dynamically-typed language. For example, anyone who knows Python knows that this code is wrong:

xs = [1, 2, 3, 4, 5]

for x in xs:
    print(x[0])

The reason we can see the code is wrong is because we know that x comes from xs, which is a list of integers, and therefore x must be an integer and has no x[0] subscript. But that is precisely the same reasoning that a type inference algorithm in a statically typed language would perform, and such algorithms are in the domain of type theory. So does it really matter, for our understanding of the language, whether that inference is done by the compiler or by our brains?

[...] are there any limitations to this approach, e.g. situations where the analogy between "static types" and "runtime type tags" is no longer valid?

Yes, there are many such situations. Consider the following code in Java:

if(false) {
    Dog dog = new Cat();
}

This is a static type error, since Cat is not assignable to Dog. But because it occurs in unreachable code, it will never cause an error at runtime.

A similar example:

Dog dog = true ? new Dog() : new Cat();

This is also a static type error, because the type of a ? b : c is whatever common type that both b and c can be converted to, e.g. Object in this case; and Object is not assignable to Dog. But runtime type checking would not raise an error, since the runtime type tag will be Dog whenever the expression is evaluated.

A different example:

class Foo {
    static int foo(Object o) { return 1; }
    static int foo(Dog d) { return 2; }
}

int x = Foo.foo((Object) new Dog());

In Java the overloaded method Foo.foo is dispatched by the argument's static type, so this program will invoke int foo(Object o) and the result will be 1. But the cast (Object) new Dog() does not change the runtime type of the object, only the static type of the expression, so if the overload were dispatched by the argument's runtime type then int foo(Dog d) would be invoked and the result would be 2.

One more:

List dogs = List.of(new Dog(), new Dog()); // bare List type
List<Cat> cats = (List<Cat>) dogs; // only a warning

Object dog = cats.get(0); // succeeds, returning a Dog
var cat = cats.get(0); // throws ClassCastException

This is an example of heap pollution, which can occur in Java due to bare generic types and type erasure. Here, the same expression cats.get(0) has a different meaning depending on the static type information provided by the surrounding context:

  • When assigned to a variable of type Object, no runtime check is performed, because Object is the erased return type of List.get.
  • But when using a var declaration, the variable's type is inferred as the unerased return type of List<Cat>.get, which is Cat, and therefore a runtime check must be performed because the erased type is not assignable to the expected type.

But the fact that var cat = ...; expects a Cat, and therefore that a check must be performed, can only be determined with respect to the static type of the expression cats.get(0).

Now, none of this shows that type theory can't be applied to languages which only use runtime type tags. Rather, what it shows is that languages with static types and static checking may have semantics which can't be emulated using only runtime type tags and runtime checking.

$\endgroup$
10
  • $\begingroup$ What do you mean static types can't be emulated by runtime tags? What stop us interpret every check rule as an explicit assertion? $\endgroup$
    – 林子篆
    Commented Aug 14, 2023 at 14:15
  • 1
    $\begingroup$ @林子篆 I gave three examples where that doesn't work. In the first example, the assertion is never checked so it has no opportunity to fail. In the second, the assertion passes despite the static type error. In the third, there is no type error, just different behaviour if static vs. runtime types are used. $\endgroup$
    – kaya3
    Commented Aug 14, 2023 at 14:21
  • $\begingroup$ In your first and second example, we still can lift them up to the proper place with a temporary variable, get assertion works. It's non-trivial, but exist a path. Beyond that, the error that will not happen is not important, static system do too much in this sense. The third one you only check Java, and say no such system can do that, this is a problematic conclusion. $\endgroup$
    – 林子篆
    Commented Aug 14, 2023 at 14:28
  • 2
    $\begingroup$ @林子篆 You cannot soundly move the unreachable expression new Cat() out of the block and evaluate it, because it can have side-effects which the program is not supposed to perform. But if you don't evaluate it then you won't see that its runtime type is Cat and not Dog. To determine that this is an error ─ whenever you determine that ─ you must find its type without evaluating it. $\endgroup$
    – kaya3
    Commented Aug 14, 2023 at 14:34
  • $\begingroup$ For the third example, you are misunderstanding something but I don't know what. I have not "only checked Java", I have presented an example from Java to show that a language can have semantics which require determining the static types of its expressions in order to determine their runtime behaviour. $\endgroup$
    – kaya3
    Commented Aug 14, 2023 at 14:35
4
$\begingroup$

From my understanding, a dynamic language syntax might like: $x : \text{Expr}, y : \text{Expr} \vdash x + y : \text{Expr}$. Therefore, one can write 1 + (lambda (x) x), which might not have reasonable result for most of us.

In static type system can use rule $x, y : \text{Int} \vdash x + y : \text{Int}$ as a check, and derives the result syntax's type, and prevents error from the beginning.

Dynamic setup has no information to check anything, so the idea is checking exactly before run it. Since primitive constants at most is countable, and operations is finite, we are able to rely on value's type.

Therefore, in this perspective, dynamic checking is delaying till need to use it. However, as kaya3's answer, this way only supports expression that has no side effect.

Because side effect contains irreducible term like infinite loop, and cannot be exchanged, thus code arrangement is not acceptable. For these code, get type by partial evaluation is required, and that need a state(aka at least pushdown automata is needed), and can touch original form of the syntax, and hence static. In this sense, there has semantic we cannot get from runtime tags.


Beside the answer, type system might reduce too much, which make inconvenience. Hence, many different system are created to allow more reasonable use cases. Here is a type system way to do dynamic typing get/setter called strong update, naive case is:

let x = ref 1 (* x : ref int *)
x := true     (* x : ref bool *)
!x            (* !x : bool *)

This is interesting, setter will not care previous type, but just give it a new one, and getter accept the type in context. But conditional will involve more complicated problem:

if ... then
  x := false
else
  x := "hello"
(* what's x type here? *)

Introduce or type (Bool | String) ref might help, maybe others setup useful too, and this part should stop at here.

$\endgroup$
1
$\begingroup$
  1. How useful is type theory for understanding dynamically typed programming languages?

First of all, a type system is mostly about defining rules. Let's take for example the definition of Type system as given by Wikipedia:

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type (for example, integer, floating point, string) to every "term" (a word, phrase, or other set of symbols). [...] This checking can happen statically (at compile time), dynamically (at run time), or as a combination of both. Type systems have other purposes as well, such as expressing business rules, enabling certain compiler optimizations, allowing for multiple dispatch, and providing a form of documentation.

Note that the correctness of your program depends on whether or not it respects the language rules; you can check them at runtime or statically or not at all. Of course, practically, it is important that properties about a program are checked or enforced. But you tend to find a mix of static and dynamic checking/dispatch in most languages when it comes to types.

That being said, type systems are designed to be more easily checked at runtime or ahead-of-time: some features are added or removed to make sure that a static type checking decision procedure is sound, terminating, etc. (type theory).

If you want the program to be dynamically typed then a lot of type theory that is related to static typing is not going to be useful for you (higher-kinded type systems etc.). If later you want to guarantee some properties of your program statically, you'll need data-flow analyses or something else entirely.

If you take for example Python, it define types, like integer and strings, and under which rules they operate. So to me, the language defines a type systems: you have to understand classes and subtyping relationships, you have to understand if and how a value from one type is going to be interpreted as another type, etc. And these are the rules of your language whether you decide to interpret or compile it.

But you cannot easily describe this systems with a formal type theory designed for statically-typed languages.

  1. Are there any formal theorems showing a direct correspondence between "static types" and "runtime type tags"?

I don't think so but also I think both terms are not comparable. To me "type tags" are just a particular approach used to perform type checking at runtime: for example, you can also allocate one memory region for integers and another arena for floats: type checking at runtime is then a matter of checking in which memory region the value belongs (might be silly, this is not the point).

If the question is: can we always check types at runtime instead of statically, I guess the answer is yes (embed the compiler in the runtime and then its done dynamically?)

Can we just replace the word "type" with "type tag", and perform the type checking at runtime instead of ahead-of-time and then think about the programming language in terms of static type theory?

This question is not very clear to me.

If so, are there any limitations to this approach, e.g. situations where the analogy between "static types" and "runtime type tags" is no longer valid?

As said above, "static types" does not make much sense to me, there are types that are checked statically or dynamically, and "runtime type tags" is one way of doing the check at runtime.

$\endgroup$
1
  • 1
    $\begingroup$ I’m not entirely convinced this answers the question. Sure, formal type systems are about defining rules, but that is very vague, and they are really about defining very specific sorts of rules. I don’t think this answer really makes the case that the rules used to formally specify type systems are adequate for capturing Python’s notion of types. Additionally, you say that you think it’s always possible to check types at runtime by embedding the compiler, but I don’t think that answer is really in the spirit of the question. $\endgroup$
    – Alexis King
    Commented Aug 14, 2023 at 22:44

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .