Algorithm for the locally nameless representation used by Lean

Question

I have heard that Lean uses the locally nameless representation for handling binders, yet if I input an expression that has a bound variable in it, the bound variable is not displayed using its de Bruijn index (example [click at the end of the #check line]). Is there a detailed description of the algorithm that Lean uses for this? How is the input parsed and represented internally and how does the internal representation get translated for display? What happens if the original names can no longer be used for display, for example, if using them would produce an expression no longer alpha equivalent to the internal one? How are new names chosen in that case?

Displaying is a completely different matter. You can, for example, store an optional suggested name alongside de Bruijn indices so that it displays that name for readability. — Trebor, Commented Jun 21, 2022 at 9:18
I completely rewrote my answer if you want to take another look. — Jason Rute, Commented Jun 21, 2022 at 13:33
@Trebor A small nitpick: It appears the names are stored with the binder not the variable (which uses de bruijn indices) and don’t appear to be optional (at least in Lean 3, it might be different in Lean 4). — Jason Rute, Commented Jun 21, 2022 at 16:01
...Actually, maybe I should be a bit more careful in what I say. There might be some macro magic which lets you associate optional names to things that I don't know about. Also, in ∀ p : Prop, p → p the inner variable (which is unused) is automatically assigned a name of ᾰ while one can instead write the → explicitly as a binder such as ∀ p : Prop, ∀ h : p, p. So maybe that is what you mean by optional. — Jason Rute, Commented Jun 21, 2022 at 16:23

Andrej Bauer · Accepted Answer · 2022-06-21 18:00:00Z

I implemented the spartan type theory as an educational tool for people who are interested in implementing their own formal systems. If you look at how the type of expressions expr is defined,

type expr =
  | Bound of index (** de Bruijn index *)
  | Atom of atom (** primitive symbol *)
  | Type (** the type of types *)
  | Prod of (Name.ident * ty) * ty (** dependent product *)
  | Lambda of (Name.ident * ty) * expr (** lambda abstraction *)
  | Apply of expr * expr (** application *)

you will see that Prod and Lambda take a Name.ident, i.e., they remember what the name of the bound variable was, even though de Bruijn indices are used (that's the Bound constructor). When expressions are printed out (see print_expr) an auxiliary function print_binders figures out what strings it should use to print out de Bruijn indices. It tries to use the string that is stored in Lambda and Prod, if possible, otherwise it finds another name (see Name.refresh which find such a name).

In addition, here are my blog posts on "How to implement type theory" (part 1, part 2, part 3). Note however that the source code for these tutorials is somewhat messed up. The tutorials themselves still explain some basic ideas.

I know you want to implement first-order logic, but that is irrelevant. Many things that are explained in the above material will be useful in your endavors.

Jason Rute · Accepted Answer · 2022-06-21 13:32:56Z

I'm not sure what you mean by algorithm, but it sounds like you want to know how an expression is stored internally in Lean. Let's take your example:

∀ (x : ℕ), x ≥ 0

Even if you turn off all the notation, Lean will display it with variables since the input and output of Lean is variable based. But you are correct that internally it also uses de Bruijn indices. To see this, you can do the somewhat complicated looking:

#eval ```(∀ (x : ℕ), x ≥ 0).to_raw_fmt.to_string

The triple-quotes around your code turn it into a lean expression, so it has type expr instead of type Prop. The .to_raw_fmt.to_string just prints the expression to a string representing the underlying constructors for the expression.

(pi x default [macro annotation (const nat [])] (app (app [macro annotation (const ge [])] (var 0)) [macro prenum]))

Unfortunately, this still shows a lot of macros (which are sort of promises to elaborate that part of the expression later). This can be fixed by putting in a more elaborated formula to begin with:

#eval ```(∀ x : nat, nat.ge x nat.zero).to_raw_fmt.to_string
-- (pi x default (local_const nat nat (const 4._.98 [])) (app (app (local_const nat.ge nat.ge (const 4._.99 [])) (var 0)) (local_const nat.zero nat.zero (const 4._.100 []))))

The var 0 is how the variable x in x ≥ 0 is represented. It has a de Bruijn index of 0 pointing it to the binder pi x. (Notice the variable name x is also there, but is stored with the binder instead of the variable).

If you would like to understand this more, you can look at the code for the inductive expr datatype in Lean 3, which has good comments. (Here is the Lean 4 version which is cleaner.)

This is also documented in the paper A Metaprogramming Framework for Formal Verification which covers the expression datatype in more detail.

The general way one accesses such low level details (including some of the tricks I used like the triple back quotes) is through meta-programming. For Lean 3, see Tutorial: tactic writing in Lean. For Lean 4, see (work in progress) Lean 4 metaprogramming book.

Stack Exchange Network

Algorithm for the locally nameless representation used by Lean

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
implementation
binders
or ask your own question.

Linked

Hot Network Questions

Algorithm for the locally nameless representation used by Lean

2 Answers 2

Not the answer you're looking for? Browse other questions tagged implementationbinders or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
implementation
binders
or ask your own question.