Performance of de Bruijn indices?

Question

Can any of you refer me to documented results of experiments on the performance of provers using de Bruijn indices to represent references to bound variables versus mentions of names or other simple alternatives? I am particularly interested in simple type theory and higher-order logic, but results with implementations of other logics might also be indicative.

Similarly, any characterization of costs of internal operations used to perform frequent tasks such as unification, substitution, replacement, etc. would be valuable.

If you're asking with the intenion of implementing bound variables, then bindlib is definitely an implementation worth paying attention to. The repository has links to papers describing it, and section 1.2 of this overview paper has a discussion of various approaches (including de Bruijn indices and HOAS). — Andrej Bauer, Commented Feb 28, 2022 at 7:33
@Andrej Thank you for the relevant and helpful comments. Though not claiming to be an answer, this looks like very useful information. — Cris P, Commented Feb 28, 2022 at 18:10

Michael Norrish · Accepted Answer · 2022-03-01 03:00:03Z

HOL4 has long provided two implementations of its kernel. The first, due originally to Konrad Slind, uses de Bruijn indices for bound variables and named free variables (“locally nameless” if you like). The other is more like HOL Light’s kernel (see its definition of the term type) and uses names (coupled with types) for all variables. Because the kernel API is sufficiently abstract, all of HOL4’s library of proofs and tactics work on top of both kernels (our build and regression tests run against both).

(The HOL4 term data type's API is specified in this file. There is no SML datatype visible, precisely because it can take on two different forms.)

The original kernel was augmented in the late 1990s by Bruno Barras to support explicit substitutions, which are in turn used to implement efficient “call-by-value” reduction. (This system is described in his TPHOLs paper, Programming and Computing in HOL.) This facility, which we call EVAL, is heavily used in a number of applications.

This facility seems to give the index-based kernel an “unfair” advantage in practice, based on timings observed for builds of the whole system. The difference is not hugely significant but it’s definitely there. Equally, it is certainly possible to construct scenarios where indices will perform worse: if you have to pull an abstraction term apart and extract its body, while keeping the abstract data type’s invariants intact, you have to perform a linear traversal of the body replacing the indices with new free variables.

Perhaps it is possible to write a performant version of Bruno’s code for a named-variable-kernel, but it hasn’t been done.

Finally, for what it’s worth, Isabelle uses indices for bound variables, and exposes this choice through its API (see the definition of the term datatype). This means this implementation choice is basically baked in to the design. However, when working with this term type, Isabelle programmers can pull abstractions apart in constant time (but must then maintain context information to tell them what the indices mean). The eventual conversion to “certified term” (the type underpinning theorems) must also check that there aren’t any dangling indices and perform type-checking.

Great information, and thank you for sketching use of indices in HOL4 and Isabelle. The concept of "call-by-value" reduction sounds very interesting indeed as well. Could you clarify a bit the phrase "keeping the abstract datatype's invariants intact"? I took the previous mention of "an abstraction term" to mean a "lambda term". Was that incorrect on my part? — Cris P, Commented Mar 1, 2022 at 2:26
The invariants include that the term values are always well-typed, and that all values are either abstractions (yes, a lambda-term), applications, variables or constants. In particular, there is no fifth possibility corresponding to being an index. — Michael Norrish, Commented Mar 1, 2022 at 2:51
I've added links into the answer to point at ML code implementations of terms in the three systems. — Michael Norrish, Commented Mar 1, 2022 at 3:00

Stack Exchange Network

Performance of de Bruijn indices?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
performance
higher-order-logic
bound-variables
or ask your own question.

Hot Network Questions

Performance of de Bruijn indices?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged performancehigher-order-logicbound-variables or ask your own question.

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
performance
higher-order-logic
bound-variables
or ask your own question.