7
$\begingroup$

Critical bugs are periodically found in Coq, and I assume in other proof assistants as well. We are still happy to mostly trust the proof assistants, partly because these critical bugs are relatively rare, but also because most of the time the bugs are obvious in the sense that it's hard to rely on them without noticing something is off. Or is it? Have there been "proofs" that were later discredited due to such critical bugs?

Related: this question about type-in-type, which does list some examples of the phenomenon I'm pointing at, although in that case people were knowingly working in an inconsistent setting.

$\endgroup$

3 Answers 3

1
$\begingroup$

My experience is largely confined to the HOL world: I know of no evidence for bugs in the kernels of the various HOL systems that have led to accidental proofs of false theorems. Some bugs may have led to proofs of valid theorems via invalid inferences (typically bugs to do with variable capture problems). I think it is also noteworthy that most of the bugs in the HOL kernels that compromised the logic have been found by inspection or retrospection by the implementers and not by end-users - end-users tend to find the bugs where the kernel crashes when it shouldn't and not the conceptual errors. Most users aren't malicious.

The CakeML work has resulted in an implementation of HOL (and now Isabelle/HOL) in an environment that has been verified to very demanding standards. So I think we can look forward to a world where we can claim a high degree of confidence in proof assistants.

$\endgroup$
0
$\begingroup$

This is not an exact answer to your question, but I know of an example of someone intentionally proving a known false result as a means of demonstrating a bug in a known proof assistant.

This blog post mentions this proof of 1=0 in metamath.

The proof of this pseudotheorem is given below (source).

$( Prove that zerisone. Contradicts tzero obviously. $) 
zerisone $p |- = 0 S 0 $=
  ( vary quant_ex tvar tzero tsucc binpred_equals wff_atom wff_quant exiy ax-mp
  exnlspec ) ABACDEZFGHDLFGAIAKJ $.
$\endgroup$
1
  • 3
    $\begingroup$ It is standard practice to expose a bug in a proof assistant by using the bug to prove false. $\endgroup$
    – Rob Arthan
    Commented Aug 31, 2022 at 22:25
0
$\begingroup$

Does this count? I’ve heard various stories like this from various AI labs:

  1. train a model to predict tactic proofs either by trial and error or by imitating human data
  2. Realize later (hopefully before the results were finalized and a paper was published) that sometimes the model was using tactics it shouldn’t have like CHEAT_TAC for HOL Light, or sorry (or it’s many synonyms) for Lean.
$\endgroup$
1
  • 1
    $\begingroup$ It's interesting! But not really an example of what I was asking, since it seems to me to be a bug with the AI model, not with the proof assistant itself. $\endgroup$
    – Ana Borges
    Commented Jul 25, 2022 at 15:45

Not the answer you're looking for? Browse other questions tagged or ask your own question.