Using proof assistants to generate fast code

Question

Proof assistants allow to state that

$$A(BC) = (AB)C$$

with $A$,$B$,$C$ compatible matrices,

Is there a formal system that takes this sort of equations to choose among interpretation strategies of some target language, say for evaluation, the one with the lowest cost?

Since the question doesn't explicitly state its relation with coq, agda etc, I took the liberty to remove the tags. If it indeed has a relationship, please edit your question. — Trebor, Commented Sep 11, 2022 at 11:16
opt-einsum helps find you a fast way to multiply matrices or contract tensors. However, it had nothing to do with proof assistants. — CrabMan, Commented Sep 11, 2022 at 11:51
@CrabMan indeed. it can find such (valid and) fast way because the library of equivalence it has is valid. Being able to enrich its library would allow for faster solutions. I wonder if the ability to have those equivalences be first class have been put to use to build up efficient EDSL, by adding a "cost function" and a solver. — nicolas, Commented Sep 11, 2022 at 12:11
My abstraction of a compiler is it maps an input language to an output language by applying incremental transformations which conserve semantics; from there, optimization generalizes to a search over these transformations to minimize a cost function computed on the output. Theorem Provers are used to validate compilers and their transformations but don’t seem to me to fit into the compilation process itself. Unless you want to allow transformations which may be incorrect, accepting the result only if a prover can validate correctness between states around the unreliable transformations. — user1168, Commented Sep 15, 2022 at 18:48

tom · Accepted Answer · 2022-09-11 14:00:04Z

5

I would say that this paper, with code in Coq, meets your requirement: Verified tensor-program optimization via high-level scheduling rewrites

First two sentences of the abstract

We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional array language. Optimizations rely on user scheduling using series of verified, semantics-preserving rewrites.

You can find the code on GitHub.

Edit: Reading the question again. This does not prove the rewritten program is faster, only that it computes the same result as the original. So it might not be an answer to your question.

edited Sep 11, 2022 at 14:00

answered Sep 11, 2022 at 13:50

tom

2211 silver badge6 bronze badges

$\begingroup$ This is a nice read. It's one step in the direction of the question. More specifically I was wondering if such rules could be supplied to a compiler, which itself had some optimization metrics to optimize (representing some heuristics, or only one true model of runtime cost, or something in between). then a linear algebra library, for instance, would define some operations for the user, along with some additional equation for the compiler. some other extra linear linear algebra library would not define any new operation but only extra equations, presumably resulting in higher performance $\endgroup$
– nicolas
Commented Sep 13, 2022 at 6:41
$\begingroup$ specifically, in this paper, rewrites are manually specified, and they are outside of the compiler pipeline. I should amend my question probably $\endgroup$
– nicolas
Commented Sep 13, 2022 at 6:48
1

$\begingroup$ You might want to have a look at Halide a language where you do rewrites based on linear algebra. Look at their publications, they have some on automation. In the future, I'm expecting some work on automation that is also verified. $\endgroup$
– tom
Commented Sep 13, 2022 at 21:34

Add a comment |

Jason Rute · Accepted Answer · 2022-09-11 11:52:22Z

I think your question can be interpreted in many different ways. Since you tagged it with coq, lean, and agda (initially), I assume you are interested in dependent type theory. DTT is special in that it is a language for both mathematics and computation, and therefore it can be used to build proof assistants (Coq, Lean 3/4, Agda) and fully functional programming languages (Lean 4, Idris, Adga).

Kernel computation

Now, if we consider DTT from the point of view of a programming language, there are even two places where code is "run". The first is internally in the proof assistant. If you prove 1 + 1 = 2, one way to do this in the CiC implementations of DTT is by reducing both sides to the same thing. In Lean for example, this is the rfl term or the refl tactic. In Coq, it is the reflexivity tactic. One doesn't even need to have closed terms. x : nat |- 1 + x = S x is provable by reflexivity in Coq since 1 + x = S x by definition. Also, it is the same mechanism which lets you type-check something like x : Fin (1 + 1), y : Fin 2 |- x = y since for x = y to type check, their types must be the same and that means 1 + 1 must be definitionally equal to 2.

My understanding is that evaluation in the reflexivity tactics at least naively reduce the term in a deterministic way, unraveling the definitions. So when evaluating 1 + (2 + 3), there is some freedom of evaluating (1 + 2) first or unraveling 1 + (...) to S (...) first (and I don't know the details to know how it is done in practice). But one can't change the term to say (1 + 2) + 3 in the computation.

I'm brushing something under the rug here also about numbers, but it is worth mentioning since you care about fast code. In a proof assistant when you write 3, that is syntatic sugar for either a (DTT representation of) the unary SSS0, the binary 11 or the decimal 3 depending on the proof assistant and some settings. Working with the latter two can make arithmetic faster.)

Now this reflexivity code is run in the "kernel" and it is possible to optimize the kernel. For example, Coq's kernel is more optimized than Lean's (especially Lean 3's) letting it run much more intense computations. One could even go so far as to optimize standard math operations by implementing say 1 + (2 + 3) on the chip using standard big ints instead of by using the base definitions of +. However, every optimization comes with a cost to the assurance that your kernel is correct. A bug in a kernel could make it impossible to prove some stuff, or worse possible to prove a contradiction.

Now, keeping with computation in proofs, it is possible of course to redefine any operation to be faster. So for example, the naive definition of the Fibonacci numbers is slow in any programming language. But you could optimize the computation. However, base DTT and its common axiomatic extensions like classical logic and univalent foundations, are compatible with functional extensionality. Therefore, all implementations of fib would be equivalent from the point view of what you can prove about them in the logic. In particular, you could couldn't prove that a dynamic programming implementation of fib is faster than the naive version. You could only define fib and fast_fib and prove they are equal (or equivalent if you don't have FE) functions. Then to prove fib 20 = 6765 you could first rewrite fib to fast_fib and then do reflexivity.

To avoid having to manually rewrite all functions with their faster versions, there are other tricks where instead of using the built in reflexivity tactic, you use a tactic which evaluates your expression for you in an efficient manner. In Lean 3, there is norm_num which applies rewrite rules which evaluate expressions in arithmetic in a binary way instead of a unary representation. It is much faster than Lean's built in refl. As such, one could certainly have a norm_num-like tactic for other computations like matrix arithmetic.

Compiled/VM computation

In Lean and Agda, it is possible to run code directly as if these were programming languages. I believe Lean 4 and Agda are both especially designed as a user-friendly functional programming language to say rival Haskell. Now, again if you make a definition fib, the VM or compiler will run the code you wrote mostly as written. However, there are a few more possibilities for optimizations. I'll just talk about Lean 4 since that is what I know. Lean 4 lets you replace any definition with another for the purpose of computation. This doesn't effect the logic, so you can't prove false, but it still would allow you to get into trouble if you don't replace the pure Lean function with equivalent functionality. (This often happens around edge cases like 0.)

There are a few ways to do this in Lean 4. For one, Lean 4 has a foreign function interface which lets you say replace a particular Lean function with a C version. I know all of Nat arithmetic in Lean 4 is compiled to fast integer libraries. Similarly, you can replace a Lean function with another faster Lean function and I think (but I can't find the reference in the manual at the moment) that you also have the option to prove that this faster Lean function is equivalent to your original, increasing the trust of your compiled code. (Lean 4 is filled also with a number of other optimizations such as in place memory usage and pointer equality checking, which are possible since the programming language is so pure.)

Extracted code

Finally, many ITPs including Isabelle/HOL as well as DTT ones offer different ways to talk about and extract code. For example, CompCert was a verified C compiler verified in Coq. I'm not really familiar with this, but this would likely give you methods of being able to talk about code run-time without violating function extensionality and also let you extract this code to other programming languages like C or OCaml. But again, I don't know much about this.

"In particular, you couldn't prove that a dynamic programming implementation of fib is faster than the naive version. You could only define fib and fast_fib and prove they are equal (or equivalent if you don't have FE) functions" Yes, so this means that the question would apply to a target language defined somewhere (possibly an EDSL ?) — nicolas, Commented Sep 11, 2022 at 12:20
Yes, some sort of DSL would work if you had a model for it and the runtime of the base operations. Again, it wasn’t really clear where you were going with your question. — Jason Rute, Commented Sep 11, 2022 at 13:29

Stack Exchange Network

Using proof assistants to generate fast code

2 Answers 2

Kernel computation

Compiled/VM computation

Extracted code

Not the answer you're looking for? Browse other questions tagged
applications
programming-languages
or ask your own question.

Hot Network Questions

Using proof assistants to generate fast code

2 Answers 2

Kernel computation

Compiled/VM computation

Extracted code

Not the answer you're looking for? Browse other questions tagged applicationsprogramming-languages or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
applications
programming-languages
or ask your own question.