11
$\begingroup$

Suppose a language has a conventional list data structure that contains some dynamic number of values indexed sequentially. Some operations to access a value from the list require that a certain index exists, or that at least a certain number of values are included, and there will be an error at run time otherwise; a special case is needing just that the list be non-empty.

How can the language guarantee in advance that these operations will be safe, and reject any programs that would crash? What are the costs of adopting that solution?


To illustrate, a couple of examples of statically unsafe lists in different paradigms, firstly in Java:

List<String> l = new ArrayList<String>();
int i = 5;
l.get(i); // Throws IndexOutOfBoundsException at run time

then Haskell:

head [] -- *** Exception: Prelude.head: empty list

Both of the above are typechecked and compiled without issue, but will crash at run time; in real programs, these lists would be being provided as parameters, fields, etc, and the index could be a non-literal dynamic value. Ideally, these crashing programs would be rejected statically, while programs that only retrieve items that do exist would be allowed.

$\endgroup$
4
  • 1
    $\begingroup$ Is this possible in the general case? What if it depends on an external value, such as user input or network data? $\endgroup$
    – Bbrk24
    Commented Jun 14, 2023 at 3:41
  • $\begingroup$ The sensible options include dependent types & returning an optional. Here is a silly idea to add to the list. Index using index mod size. It meets the criteria of preventing out of bounds access and retrieving only items that exist but there are some obvious correctness issues. $\endgroup$ Commented Jun 15, 2023 at 0:30
  • $\begingroup$ @Bbrk24 of course not, you need to somehow check that the external value is in range, which must be done at runtime. $\endgroup$ Commented Jun 27, 2023 at 15:33
  • $\begingroup$ Actually it is possible to define such a language. Dependent types do it. It just has the consequence that code where the compiler cannot prove it is rejected. Which is actually why such behavior isn't very popular. $\endgroup$
    – feldentm
    Commented Aug 18, 2023 at 16:07

3 Answers 3

9
$\begingroup$

Dependent Types

A Dependent Type is one whose members can be described by a predicate depending on other values.
In your case, it is Index (xs: List a) = { i ∈ ℕ | i < |xs| }.
As you can deduce from the i ∈ ℕ clause, this is a strict subtype of .

Here's an example programme description using it:

List a = enum
    Cons: a -> List a -> List a
    Empty: List a

length: List a -> Natural
length l = f l 0 where
    f (Cons _ xs) n = f xs (n + 1)
    f Empty n = n

Index (xs: List a) = (i: Natural) when i < length xs

get: (xs: List a) -> Index xs -> a
get (Cons x _) 0 = x
get (Cons _ xs) n = get xs (n - 1)

main = do
    let xs = Cons "h" (Cons "42" Empty)
    putStrLn (get xs 0) //ok
    putStrLn (get xs 5) //Compiler Error

A question that arises then is that of get "knowing" it will never get Empty as input, at which point one will notice that length: List a -> ℕ has not a descriptive enough type.
Indeed: length's correct type is (Empty -> 0) & (Cons -> ℕ⁺), which explicitates the fact that Index Empty is not a type existing in the Universe.
(At which point one may notice how Cons/:: relates to NonEmpty a).
How smart a language and implementation has to be to automatically understand and report on those problems then is the realm of Automated Theorem Proving.

Naturally, when a value is not literal, one has to prove it indeed inhabits the type at hand.

let n: ℕ = userInput in
get xs n

is not a valid programme, for is a supertype of { i ∈ ℕ | i < |xs| }, meaning explicit type narrowing has to be involved:

let n: ℕ = userInput in
if n < |xs| then
    get xs n
else
    //deal with error

which should be reminiscent of C++'s dynamic_cast, and is already existing practice (you don't just use unsanitised user input as-is, do you?).
This also relates to the realm of Automatic Type Narrowing as featured in Kotlin, TypeScript, and, less automatic, Ceylon.

An other example is that of Min and Max: ∀(a: Comparable). MinMax (x: a, y: a) = { x, y }, which may be used to describe functions such as

MinMax = ∀a. Comparable a => (x: a) -> (y: a) -> (x | y)

max: MinMax
max x y = if x > y then x else y

min: MinMax
min x y = if x > y then y else x

where x and y are runtime values used as singleton types in a union type, where it is guaranteed that for each pair of x and y, min and max both will return precisely either that x or that y, and nothing else.
Naturally the result of min and max is a strict subset of a by virtue of { x ∈ a | f(x) } <: a.

Languages implementing those include, among others, Idris, Agda, Coq, F*, and ATS.

$\endgroup$
3
  • $\begingroup$ In what language are the code examples written? I see that they are marked as Haskell, but they look similar to Haskell but not the same. $\endgroup$
    – user570286
    Commented Jul 11, 2023 at 19:49
  • $\begingroup$ @user570286 Pseudo-code I made up. when in types should be reminiscent of LiquidHaskell, and the enum types are GADTs $\endgroup$
    – Longinus
    Commented Jul 11, 2023 at 23:06
  • $\begingroup$ @user570286 Also Raku because of course Raku supports this. $\endgroup$
    – Longinus
    Commented Jul 11, 2023 at 23:11
4
$\begingroup$

Let's get the easy solution out of the way first: you can just have these operations like get(int) and head() return an optional value, and then it's statically checked that you handle both cases where the element does or doesn't exist, the same way it's checked for an option type. Some notable languages which do this are Rust (get), OCaml (nth_opt) and F# (tryItem).

It's perhaps a bit cumbersome to force the user to handle an option every time they use a fundamental list operation; but in practice it's often not that big a deal, because most of the time you want to access the elements in a list, you do so through an iterator rather than the get method directly.


Other solutions could be based on theorem-proving; the goal is to statically prove that 0 ≤ i < n as a precondition for calling get(i), and reject the method call if this precondition can't be proved. There are different ways of going about this:

  • You can include a sub-language in which the programmer can state and prove such propositions; normally, this is based on Hoare logic or a similar logic. This approach gives languages like F* or tools like KeY.
  • You can have the compiler try to automatically prove them based on the conditions in branching statements; this would be something like control-flow type narrowing with dependent types. I don't know if any existing languages do this.

See also: What options are there for adding formal verification to a general purpose programming language?

$\endgroup$
1
  • $\begingroup$ I think I've heard of a language that uses integer range types in one of Discord PL communities. Ranges can be inferred on arithmetic operations as well as branches, so it could fill in the "automatic proof" category. $\endgroup$
    – Bubbler
    Commented Jun 14, 2023 at 4:27
3
$\begingroup$

A MimimumLengthList<int>/MaximumValueInteger<int> trait

I think this is basically the same as dependent types but maybe easier to understand

The operation a[5] requires a: MimimumLengthList<6>. There are a few ways to get a MimimumLengthList:

  1. The list has a known size
list: a[6] = [1,2,3,4,5,6];
a[5]; // Ok
  1. You assert the list is a mimimum length list:
fn f(a: MimimumLengthList<6>) { return a[5]; //ok }
  1. An explicit check
let a = json_parse(input());
if (a.len() > 5)  {// this is a type assertion, special cased by the type checker
    return a[5];
}

Last case is similar to what languages that check bounds at runtime would do, but now you need to explicitly check and can choose other actions to perform in the false case besides just throwing an error.

Maximum Value Integer

You may sometimes want to do:

x: int = input();
if (x<5) {
    print([1,2,3,4,5][x]);
}

For this purpose, you can make a trait MaximumValueInteger which just represents an integer asserted to be less than a given value.

Combining the two

You could safely define a function like this:

fn get(a: MinimumLengthList<5>, b:MaximumValueInteger<4>){a[b]}
a([1,2,3,4,5,6,7,8], 2);

However, conditionals get a bit more complex:"

if (a < len(b)) {
    b[a]
}

In this case, it's hard for the type checker to actually assert what type a and b is since neither of them is known at compile time. Yet both are safe. You could implement this as a kind of generic only during the type checker phase, something like:

if (a < len(b))<T>: a:MaximumSizeInteger<T>, b:MimimumLengthList<T> {
     
}

I made up syntax for conditional type assertion here since I'm not aware of any languages that have it.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .