32
$\begingroup$

Fifty years ago, few would have imagined that the process of verifying the correctness of a known proof of a mathematical theorem might be so costly that the mathematical community would hesitate to do it more than once. Of course, nowadays, there is a growing list of theorems whose proofs require an enormous amount of computation. In many cases, these theorems amount to calculating the first few values of an infinite sequence of values, and maybe not many people would care to have those computations formally verified. In other cases, however, there is an attractive theorem statement that many might want to see a formal proof of. The Kepler conjecture is an obvious example; perhaps less well known to the proof-assistant community is the ternary Goldbach problem, solved by Harald Helfgott, who reduced the problem to a feasible finite computation. A complete proof therefore requires doing this computation; to give some idea of how big this computation is, let me quote a paragraph from Helfgott's preprint:

In December 2013, I reduced $C$ to $10^{27}$. The verification of the ternary Goldbach conjecture up to $n \le 10^{27}$ can be done on a home computer over a weekend, as of the time of writing (2014). It must be said that this uses the verification of the binary Goldbach conjecture for $n \le 4 \cdot 10^{18}$ [OeSHP14], which itself required computational resources far outside the home-computing range. Checking the conjecture up to $n \le 10^{27}$ was not even the main computational task that needed to be accomplished to establish the Main Theorem—that task was the finite verification of zeros of $L$-functions in [Plab], a general-purpose computation that should be useful elsewhere.

Proofs like these seem to pose a significant knowledge management problem if we envisage a future in which the majority of interesting mathematical results are archived in some machine-checkable library. It seems that if we want to check the proof of a new theorem that relies on an existing theorem whose proof is extremely expensive, then we are faced with the choice of either redoing the expensive computation, or taking it on faith that someone else checked it and that we haven't made a bookkeeping error somewhere along the way. If there are only a few expensive theorems then maybe there isn't a problem, but if expensive theorems proliferate, and we want to rely on our records of whether they were checked or not, then it seems that the chances of mistakenly thinking that something was proved (when it really wasn't) start to become non-negligible. This is especially true if we have software "upgrades" every couple of years that could introduce bugs, or at least confusion over the exact meanings of various terms and theorems.

With the above remarks as background, I have a twofold question.

  1. Do existing systems have a plan for managing this kind of issue? I see that there is another question about recompilation avoidance that seems related, but doesn't seem to be quite the same.

  2. Typically, how much computational overhead is there when we take a conventional computation and redo it in a proof assistant? (Here, I'm not talking about the human effort in doing the re-programming, although I recognize that that is also a major issue.) The answer is probably, "It depends," but I'm wondering if there is any rule of thumb. In this regard, I found the Ph.D. thesis of Muhammad Taimoor Khan, Formal Specification and Verification of Computer Algebra Software, to be very interesting. For certain types of computation in Maple, one can apparently answer this question fairly easily by running Khan's software. Are there other packages available that allow someone with conventional programming experience but little experience with proof assistants to write "normal-looking code" (or something close to it) that carries out a formally verified long computation, and thereby time how much longer it takes than conventional code does?

$\endgroup$
6
  • $\begingroup$ Welcome, Timothy! $\endgroup$ Commented Mar 26, 2022 at 7:06
  • 8
    $\begingroup$ I think this question should not be closed, but for some reason is off-putting and is collecting close votes. Please note that the OP is not an active member of the proof-assistant community, so cut some slack with his style of asking questions. I changed the title to what the question really seems to be about (per Guy Coder's comment). $\endgroup$ Commented Mar 26, 2022 at 9:24
  • 15
    $\begingroup$ I would really hate it if this site adopts the "close vote avalanche" routine, whereby people vote to close just because there already are a couple of close votes. If you're voting to close, pleae explain why you're doing so. $\endgroup$ Commented Mar 26, 2022 at 9:25
  • 1
    $\begingroup$ As for your 1, I think there is a related issue regarding continuous integration/regression testing. The Coq repository on github for instance re-compiles a lot of libraries every time a new pull-request is issued, so as to ensure its impact is innocuous. But as the libraries grow larger and more expensive to check, this becomes more and more costly and long… I do not think anybody has the faintest idea as to what a solution might be, but it’s definitely a growing practical problem! $\endgroup$ Commented Mar 26, 2022 at 13:48
  • $\begingroup$ @GuyCoder Interesting that people have different notions of "proof length." I'm probably 100 years behind the times but I tend to think of proof length as the number of symbols that would be needed if every step of the formal proof were written out in full, whereas I guess "length" on this SE site means something like the amount of disk space needed? $\endgroup$ Commented Mar 26, 2022 at 20:14

2 Answers 2

19
$\begingroup$

You are touching upon some questions that we are currently investigating in our research group, but we do not have definite answers yet, so let me just make a couple of comments regarding the trade-offs.

Consider a computationally expensive verification of some mathematical fact, e.g. something that requires lots of large number arithmetic, symbolic computation, Gröbner basis computations, SAT solving, etc. In all such cases there we use specialized algorithms that are implemented using traditional software tools and made to run fast. How could we incorporate them into a proof assistant? There are several options, depending on how we trade speed for trust:

  1. We can incorporate external libraries into a proof assistant and trust them. Many proof assistants trust that number arithmetic is correctly implemented, say in the GMP library. Trusting Mathematica sounds insane, but incorporating Mathematica into Lean is worthwhile, and one can do things to increase the trust.

  2. We can incorporate external libraries into a proof assistant and formally verify them to be correct. This would be the ideal solution, but I am not aware of any such efforts. It sounds difficult to do.

  3. We can reimplement algorithms inside a proof assistant, together with proofs that they perform correctly. These can then be extracted to efficient code or compiled to native code on the fly. Of course, one then has to trust a larger chunk of the proof assistant.

The third solution will always lag behind the first one, and so one will always be tempted to trade trust for speed. But is this such a bad idea? I would argue that we should allow use of non-trusted software components, but also devise methods that track the provenance, so that we can tell which result depends on correctness of what software.

$\endgroup$
11
  • 2
    $\begingroup$ Thanks for this answer! I wonder if you can elaborate on the last point about tracking the provenance. How do you envisage doing this? It reminds me of the headaches I have when I build a complicated piece of software and have to track all the dependencies. If we go down this road, then won't we quickly reach the point where, in practice, almost every "formally proven theorem" actually depends on hundreds of untrusted pieces of software? Every time someone succumbs to the temptation to trade trust for speed, it will be difficult to reverse course. $\endgroup$ Commented Mar 26, 2022 at 12:53
  • 4
    $\begingroup$ When I read this, I immediately thought of the rigorously tested database library SQLite (apparently one of the most used pieces of software in the world): “Note that SQLite has, over the previous decade, encountered bugs in each of GCC, Clang, and MSVC. Compiler bugs, while rare, do happen, which is why it is so important to test the code in an as-delivered configuration.” CPUs have also been known to have bugs. Maybe the provenance problem is much deeper than OP suggests; indeed, it sounds intractable to me. $\endgroup$ Commented Mar 26, 2022 at 13:05
  • 1
    $\begingroup$ I don't know how to address provenance, other than to start experimenting with it. But I don't think we have a large-scale problem with trading trust for speed, as only a small proportion of formalized results is of the computationally intensive nature. And it's not clear how to trade trust for speed in a general situation. $\endgroup$ Commented Mar 26, 2022 at 13:14
  • 1
    $\begingroup$ @AndrejBauer "Only a small proportion of formalized results is of the computationally intensive nature." That is true today, but the proportion seems to be growing steadily. Maybe we should wait until we get to that bridge before crossing it, but my instinct is to plan ahead when a major future problem is foreseeable. BTW, I tried to raise this issue 15 years ago on the Foundations of Mathematics mailing list but couldn't find the right language to express my concern. It seems I'm still having trouble making myself understood. $\endgroup$ Commented Mar 26, 2022 at 20:28
  • 1
    $\begingroup$ That's ok, an accepted answer can be un-accepted in favor of another one, can't it? Although, accepting an answer is an indication by the OP that they got what they were looking for, which discourages further answers. $\endgroup$ Commented Mar 27, 2022 at 15:55
8
$\begingroup$

Not as general as Andrej's answer but more a data-point for your interest. We've been looking at how to construct formal proofs about neural network-enhanced systems in proof assistants. This runs into exactly the problem you describe: the act of verifying the network can be very expensive (hours/days) which disrupts the interactive component of the proof assistant.

As described in Figure 1 in our recent paper, we've taken the external book-keeping approach, i.e. create a cache which stores the result of the verification of the network. The proof assistant then checks the cache instead of reverifying the network each time. The cache maintains consistency by also storing the location and the hash of the network, which can be used to detect when the proof has become invalid (for example due to further training of the network).

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.