24
$\begingroup$

In compiler theory, there are multiple related, and often mixed, concepts that all help you compile code less often [1]:

Already in the early times of FORTRAN (as of FORTRAN II [1, p. 384]), independent compilation of modules was used to speed up recompilation. This allowed a change in one module to require only the recompilation of that module and linking against the previously compiled other modules of the program. Skipping the compilation of all modules except the changed one was a significant improvement over compiling everything again. This did come at the cost of possible link-time issues. To be able to link the program together, the modules needed to be up-to-date with the data layout defined in COMMON. This was done manually and only checked by the programmer.

Mesa introduced separate compilation, which solved this issue by moving the static checks of cross-module dependencies to compile-time [7]. When a Mesa module is compiled, the result includes a symbol file that can be used during the compilation of other modules that depend on that module. Other languages such as ML and C use interface files that are written by the programmer. Separate compilation brought back type correctness for statically typed programs, while preserving fast recompilation from independent compilation.

To further speed up recompilation, we can save intermediate results during compilation. If parts of the program do not change, then the intermediate results of those parts can be reused. The term separate compilation applies to compilation where intermediate results are saved per file [7]. For sub-file level tracking of changes and intermediate results, the term incremental compilation is used [17, 18]

Even when not generating executable code, these concepts still apply to proof assistants. When making a change in a formal development, we would like to spend as little time as possible on re-checking other parts of the project, especially when processing time is measured in hours, not minutes.

Unsound independent compilation is obviously undesirable for proof assistants, and sound incremental compilation is also surprisingly tricky to pull off because it requires the compiler to faithfully keep track of internal dependencies, which may require far-reaching restructuring of the implementation. A basic amount of separate compilation on the other hand can probably be found in most formal systems: when mutual dependencies are not allowed between different files, the sensible unit of compilation is a single file, and changing one unit should not lead to units that do not depend on it being re-checked. This can be implemented by a build system as basic as make. It gets more interesting when we want some changes not to affect (the entire closure of) dependents, roughly sorted from "simple" to "hard":

  • non-semantic changes such as whitespace, comments
  • changing a proof without changing the statement (only simple when proof terms are not exported)
  • file-internal changes, such as local notations
    • changing a definition that should be regarded as opaque in other files, e.g. the definition of real numbers may not be material once the relevant statements have been proved about it (problematic if you want definitions to compute)
  • changing or introducing some declaration that is not used in other files (very hard to do soundly in the presence of proof search or complex name resolution)

Are there proof assistants that tackle some of these, or other, challenges beyond the basic make approach? Are there plans to do so? Are there other issues that make this topic inherently more complex for proof assistants?

[1] Smits, Jeff, Gabriël DP Konat, and Eelco Visser. "Constructing Hybrid Incremental Compilers for Cross-Module Extensibility with an Internal Build System." The Art, Science, and Engineering of Programming 4.3 (2020): 16-1.

$\endgroup$

3 Answers 3

7
$\begingroup$

Lean (in any version) at the time of writing does not really implement any sort of recompilation avoidance beyond the make approach. In particular, the produced .olean "signature" files contain all information possibly useful for downstream consumers, including declaration locations and docstrings, so the signature file will change from basically any edit. It is also not sufficient to only rebuild direct dependencies and short-circuit the build if their output didn't change because Lean's import is transitive, so any downstream file might be affected by a change.

We want to improve this in Lean 4, both for theorem proving and "regular" programming (the latter of which avoids some of the issues mentioned in the question text). There is an RFC about providing some means for opaque proofs, declarations, and imports (making them non-transitive), but there is no design or even draft of a detailed solution to achieve these goals yet. Which is the main reason for posing this question. We are also moving towards putting information not directly relevant for compilation into separate files.

There is, however, a flag for unsound independent compilation in the Lean 3 community edition, which AFAIK is used to e.g. test changes in a tactic on an example in a far-away file without having to recompile all the files in between.

$\endgroup$
5
$\begingroup$

Agda has a very rudimentary form of caching of the current file, where it does not have to recheck the initial part of a file that has not changed since the last reload. See https://github.com/agda/agda/issues/834 for the discussion around this feature when it was originally added by Andrea Vezzosi.

$\endgroup$
1
  • $\begingroup$ Interesting! I suppose this is the "offline" equivalent of Lean's language server backtracking its state to before a textual change. I didn't even think of this kind of recompilation avoidance when writing the question, but it sure is as important in practice. A common Lean feature request is to provide incremental checking not only on the command but also tactic level, which Isabelle and Coq can do, but Lean still can't. $\endgroup$ Commented Feb 14, 2022 at 13:12
5
$\begingroup$

For metamath, we store a detailed set of proof steps for each theorem or lemma along with what we are proving. As you describe in your discussion of systems which don't export their proofs, it is the statement of the theorem (hypotheses, conclusions, and disjoint variable constraints) that a proof assistant or human needs to be able to use it in writing a future proof. For example, you can change whitespace, how a theorem is proved, and even a few carefully chosen changes like renaming theorems, with a text editor (and no need to apply the metamath tools which would be needed for, say, removing a hypothesis or other bigger changes).

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.