12
$\begingroup$

The chess computer which beat the human world champion in 1997 had a huge database of openings inbuilt into it. However my understanding of Deep Mind's alpha zero is that it is capable of generating these openings by itself.

Perhaps a theorem-proving analogy would be the following. The research groups trying to develop systems which do mathematics automatically -- is it true that right now they would have to teach such a system a proof of the infinitude of primes first (as "training data" or whatever)? And we're not yet at the stage where the systems can figure out a proof of this 2000+ year old result by itself? Or am I out of date?

$\endgroup$
1
  • $\begingroup$ I updated my answer with an explanation on why I think one version of this problem is much harder than Alpha Zero, but not hard enough to seem impossible. $\endgroup$
    – Jason Rute
    Commented May 17, 2022 at 1:31

1 Answer 1

11
$\begingroup$

I doubt you are out of date. I think given a reasonable interpretation of this question, this is well beyond current (2022) systems.

Nonetheless, I will also argue your question needs some clarification as to what "by itself means". Rhetorically, if GPT-f or the Isabelle Hammer could solve the infinitude of primes proof, would this count?

GPT-f (in Lean or Metamath) is a trained model based on human proofs. Assuming the training data didn't include the proof of infinitude of the primes, but still included much of Lean's (or Metamath's) library, would that count? There are probably other proofs about the prime numbers and other theorems about sets being infinite. Also, GPT-f takes advantage of Lean's tactics which also take a lot of math knowledge to create. Is all this too much knowledge to be able to count for it being able to come up with the proof "by itself"?

Reducing the knowledge slightly are systems found in the papers Reinforcement Learning of Theorem Proving and Learning to Reason in Theories Without Imitation These systems don't use any proof data. They only use theorem statements, and definitions, and (sometimes) built-in tactics. Would this count?

Second, the Isabelle Hammer is a mostly hand engineered tool (with a bit of machine learning), but a very powerful one. Nonetheless, a lot of it's engineering is based on our understanding of first order logic and common mathematical objects like the integers. Also, it again can use existing definitions and previous theorems in the library. Does this count as "by itself"?

And just to be clear, I don't know for sure if either GPT-f or the Isabelle Hammer can do this. I doubt it, but one would have to check. Nonetheless, my point was more of a thought experiment, since I think even if they could, it would wouldn't count.

Now one could theoretically design a system which starts from "nothing" and tries to prove theorems. Here "nothing" could mean a few things:

  1. It could start from some axiom system like Lean's axioms, as well as just the definitions needed to state the infinitude of primes. And it has the infinitude of primes as a goal. It can't use human-written tactics. It has to just use the base logic.
  2. Same as (1) but now it is not given any definitions or goals. It just has to come up with interesting math on its own.
  3. Same as (2), but now we don't even have a logical framework. It has to develop mathematics on its own.

I am confident we can't do any of those in 2022. And I question whether we can do (2) or (3) ever without somehow cheating and imparting mathematical knowledge. Imagine for example how humans learned math throughout history. I'm not an expert on this, but I imagine many real world concerns came into play: Division of property for inheritance (both geometric land plots and discrete objects and animals), engineering, understanding physics, money and accounting, and so on. Even the idea of primes might have been linked to our understanding that some things are made of multiple other things and then it is natural to ask how many basic things are there.

Now, we could train an agent in a realistic multi-agent simulation environment with physics, an economy, geometry, etc with the hope that it learns math, but to code that up we are putting in our math knowledge. Alternatively, we could feed a model with a lot of real world data (scraped from say the Internet), but again I would worry that this data already contains a lot of math knowledge.

My personal thought on this particular challenge is that the infinitude of primes should be replaced with a harder problem, and eventually we should work up to an AI which can develop unmistakably new mathematics or solve difficult unsolved problems. Then there is no trouble throwing in data, throwing in existing math knowledge, giving the model goals, and throwing in real world data. Then we don't have to worry as much about if we are cheating. We just have to worry about building the future AI mathematician.


Update: I want to update my answer with why I think my version (1) of the problem posed (which is the easy case) is much more difficult than chess, go, or Atari games, but also has a flavor of being possible with technologies on the horizon.

Recall, in version (1) the learning agent knows the axiomatic rules of a theorem prover and knows the goal (prove that there are infinitely many primes) including all the necessary definitions leading up to that goal. However, it doesn't have access to other theorems of mathematics, other definitions, or advanced tactics. At first, this feels just like the scenario of Alpha Zero where we start with the rules of chess and a goal of winning. But note there are a few key differences, and in my opinion, one of the most key differences is that two-player games like Alpha Zero make it easy to develop a curriculum of progressively harder problems. The agent plays itself, and to make progress it only needs to learn just enough to beat itself. Combine that with a good learning tool (neural networks), a good search algorithm (Monte Carlo tree search), and enough computation, it can advance from random play to grand master. But theorem proving doesn't have this. Theorem proving is a one player game, also known as a puzzle. However, even in many puzzles it is possible to develop good curriculums. To solve Sokoban, many agents just start solving small random levels, progressively getting better on harder levels. Similarly, for the Rubik's cube. Moreover, the Rubik's cube has a nice trick. You can start from a solved cube, shuffle it up, and preform every step of that shuffling in reverse to have a solution. In this way it is possible to get lots of training data at many levels of difficultly (distance from the solution).

Theorem proving seems different. Sure if one already has a vast library of theorems statements (as in much of the previous work on AI and theorem proving), then it is possible to use those theorems as a curriculum for reinforcement learning. The model will first solve the easiest problems by random chance, learn from those problems, and then be able to solve harder problems. We can even improve this, as many papers do, by learning from partially solved proofs. The real question however is if any of this is still possible without a bank of starting problems to get any traction?

There is however still a path forward. The agent could just randomly explore the space of proofs, combining axioms and rules until it proves something. Then it could learn from those proofs. As it get's better, it can find harder and harder proofs. But unlike the Rubik's cube, it is hard to find a random proof and harder still to find an "interesting one". Indeed, now we are getting into hard problems in reinforcement learning such as learning to explore an open world with few or no explicit goals. In such exploration, teh agent must be driven by "curiosity", but what makes an experience "novel" in that it should be remembered and learned from in the future, or sought out? And in mathematics, what makes a mathematical fact interesting?

Now one might see the benefit of a more holistic approach. Maybe a theorem is "interesting" if it is useful to prove other theorems? Maybe agents shouldn't just prove theorems, but also conjecture them? Maybe agents shouldn't just focus on exploring the space of theorems, but also focus on other mathematical concepts like computation? (Recall, a lot of math has been developed just to perform computations practically.) And soon we may want to be venturing outside the world of pure math? (And this is why I'm not sure if Euclid's theorem is a good test problem or not, since it's simplicity requires avoiding real world knowledge for fear we might be tainting our model with facts about the prime numbers and arithmetic.)

But even if we stay inside the world of pure math, we still probably need our agent to develop its own definitions. For example, Euclid's proof uses the product of all the first n primes, which is a special case of an iterated function. What on Earth could make an algorithm find and explore such a concept? And why is that concept any more "interesting" or "natural" or "useful" for the task at hand than any other definition in the infinite space of all definitions? If the agent just starts enumerating definitions it will get all sorts of minutia like f(x) = x+(1+x)+1. We need a way to pair this down to the most "useful", "interesting", "fruitful" ideas.

One work which has shown a lot of promise in this direction is DreamCoder, which isn't for proving theorems, but is for writing code. Not only does it learn to write code with no human code examples, but it does so by building its own library of functions. It is very careful to only keep the best most useful functions in the library to avoid an overload of choice. And in doing so it learned to derive the map, fold, and unfold operators from just pure simply typed lambda calculus. This is pretty remarkable! However, even in this case DreamCoder was only able to do this since it was given a carefully designed curriculum of problems (given as input-output pairs where the agent was supposed to find a computer function matching that pair). (In fairness to DreamCoder it used a really small curriculum, but it was a curriculum nonetheless.)

So the real question is, can we develop a good notion of "interesting", "useful", "fruitful" mathematics allowing us to start from "nothing" and make headway in mathematics by building our own curriculum of mathematics getting us closer and closer to Euclid's theorem? And if we can do it, is it worth the effort?

$\endgroup$
4
  • 1
    $\begingroup$ I guess another possible criterion is: take an arbitrary undergraduate textbook (e.g. real analysis), we are allowed to use all our mathematical knowledge to build a system, and see if it could solve any of the exercises without further intervention. We are not allowed to know these exercise questions beforehand (when developing the system). $\endgroup$ Commented May 14, 2022 at 9:01
  • 1
    $\begingroup$ Currently I'm not aware of such a system (I doubt GPT-f and Sledgehammer are anywhere near this. Otherwise I would be very interested - as an undergraduate taking open-book exams.) $\endgroup$ Commented May 14, 2022 at 9:13
  • $\begingroup$ This is an interesting paper which posits that an agent's experiences of "beauty" and "interestingness" arise inherently from it's incremental discoveries on how to compress observed information better and better. I wonder if this kind of reward function could be used to achieve (2) arxiv.org/pdf/0709.0674.pdf $\endgroup$
    – swami
    Commented Oct 11, 2022 at 16:42
  • $\begingroup$ This is the second tie today I am reminded of AM and Eurisko. I'm not sure how advanced their heuristics would be compared to a modern theorem prover. AM identified prime numbers as interesting from its base axioms and re-discovered things like Goldbach's conjecture. I don't know how well it reasoned about infinite sequences like the primes. $\endgroup$ Commented Jul 3, 2023 at 14:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.