2
$\begingroup$

I want to contribute to the repository MiniF2F (https://github.com/facebookresearch/miniF2F/tree/main) which has formal formulation of problems asked in Math Olympiads in different languages like Lean, Metamath, etc. However, there are no formulations in Coq. Since, I'm not an expert in Lean and since there are around 240+ theorem statements in the repository. It will be very helpful if someone can tell me how hard it will be to convert these formulations from Lean to Coq. I also want to know if mathcomp will be sufficient to express such formulations.

Lean formulations can be found here: https://github.com/facebookresearch/miniF2F/blob/main/lean/src/test.lean

$\endgroup$
1
  • $\begingroup$ By hand or automatically? $\endgroup$ Commented Aug 5, 2023 at 9:07

1 Answer 1

2
$\begingroup$

First, I would like to say that translating the MiniF2F benchmark to Coq is important and someone should do it, so thank you for raising awareness of this.

There is no good automatic way to translate Lean to Coq automatically. It has to be done by hand. And just like translation of natural language, it helps to know the target language really well. If you are translating proofs, there is a helpful cheatsheet, but for MiniF2F, it is more important to translate just the statements. Indeed, you don't even need to know Lean in this case, since you could just as easily start with the natural language problem statements found in the informal directory. Indeed, one probably doesn't want to be slavish to the Lean formulation, but instead use idiomatic Coq.

Now, let's discuss MiniF2F in particular (which is not just Olympiad problems, but instead a range of high-school-level math competition problems designed for benchmark theorem proving AI). Many people have discussed translating it to Coq, but so far it hasn't been done. Here are the previous discussions:

The big blockers seem to be the following:

  • There is more than one way to translate all these into Coq. For example, Coq has many definitions of the real numbers. Which one is best to use? While math-comp is very comprehensive, it also requires using ssreflect for proofs.
  • It isn't clear if one should provide proofs. The good point of adding proofs is that it makes clear that the translation is correct (and the Lean MiniF2F is plagued with Lean formulations which are wrong making the problem impossible or trivial). It also is helpful to know you have included the right number of Coq imports to make the problem easily solvable. Besides being a lot more work to add the proofs, a bad point of having them is that if the proof is public, pre-trained models might accidentally train on the proof leading to data contamination. Also if you start adding lemmas in your proof (as some of the work translating MiniF2F to Coq has done), it could make it harder to automatically parse the file for benchmarking.

Further complexity is that this was never a community project. It was a project of OpenAI, but all those people left and I think it is abandoned. Then it was picked up by Facebook, but again it is not clear anyone is left at Facebook to maintain it. It probably needs to be forked again. Also, the people most interested in the translation (those like myself working on AI for theorem proving), and those most capable of translating it (hardcore Coq users) don't have much overlap, making it difficult to get the work done.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.