2
$\begingroup$

I asked Bard to "Give me a challenging mathematical riddle to which the answer is 23"

It answered: "Here is a challenging mathematical riddle to which the answer is 23: I am the sum of the first 7 positive odd numbers. What number am I? The answer to this riddle is 23. The first 7 positive odd numbers are 1, 3, 5, 7, 9, 11, and 13. Their sum is 23."

This is clearly untrue?

$\endgroup$
3
  • 1
    $\begingroup$ Perhaps because the foundation models of most leading AIs was trained on "tokens" (string fragments) not numeric types? $\endgroup$
    – knb
    Commented Aug 27, 2023 at 20:01
  • $\begingroup$ Models don't really understand math. If they produce a correct answer, it's because that's the most probable answer given the data they were trained on. We've had a similar question for ChatGPT. This should apply to all models that are just trained to predict the most likely output. $\endgroup$
    – nbro
    Commented Aug 28, 2023 at 8:32
  • $\begingroup$ Does this answer your question? Why is ChatGPT bad at math? It's not really a duplicate but I think the question are very related that they can be closed a such given that the answers will most likely be all similar or refer to the same reasons $\endgroup$
    – nbro
    Commented Aug 28, 2023 at 8:32

1 Answer 1

5
$\begingroup$

You are not alone in having unrealistic expectations of a technology labelled as "AI" that appears to have very general capabilities regarding text. After all, it will always respond to any input, and the tone of responses for most of the fine tuned LLMs (of which Bard is one) give an air of confidence.

The latest releases of chatbots and related technology are all based on large language models (LLMs), where the core training process is to predict the next word, or more precisely a token which can be just part of a word, given all the text that has already been seen. This trained model is then leveraged to emulate chatting or other text generating processes. For each use case, the model can be fine tuned for the type of content and utility, and can also be "pre-prompted" to establish the nature of the content. The fine tuning and pre-prompting doesn't radically change the core nature of an LLM as a text prediction engine though.

During training, typically on immense amounts of data far larger than human experience of a single person can cope with, the LLM does learn some approximations of logic and facts, in as much as they help it predict the next token. Some of these learned capabilities are robust, useful and effective. Some less so. The more complex a real-world constraint or fact is, the less likely the LLM will have modelled it with an accurate approximate function internally.

The riddle problem you set in the question is exposing the limitations of the approximate internal models. The LLM is capable of generating a grammatically correct riddle, plus an explanation, plus meet the constraint placing a specific answer in the text. It is not capable of analysing its own output and figuring out that it has made an incorrect statement. That is, it has either no model or a weak approximate model for constraints on what should be in the mathematical statements in the riddle. From what you have written I would say "weak model" here, because the number produced by resolving the puzzle is still the correct order of magnitude, and that probably is not a coincidence.

Expect future iterations of AI chatbots and assistants to get better at this kind of thing - in my opinion this will happen purely because it is an expectation many people will have of "useful AI". Simply scaling the LLMs will only help a little, so future work is probably looking at adding further models or integrations that can fact-check outputs, or better understand and model requests (beyond representing as streams of tokens).

In the meantime, you are always recommended to fact check results from LLMs. Do not assume any output is factually correct, always validate anything where you need it to be correct or true.

$\endgroup$
4
  • $\begingroup$ Surely if any AI 'Chatbot' is to become popular it should be trained specifically in Mathematics in addition to the huge amounts of assorted data? $\endgroup$ Commented Aug 28, 2023 at 13:27
  • $\begingroup$ @bensvensson ChatGPT is already extremely popular already, so no that does not appear to be a requirement. In addition the training process for LLMs cannot so easily be adapted to feed in factual information. It doesn't learn in anything like the same way as humans. Things that might seem easy to you from outside of understanding the system are often anything but - xkcd.com/1425 is relevant $\endgroup$ Commented Aug 28, 2023 at 13:31
  • $\begingroup$ Ah, splendid, I forgot how pertinent xkcd can be, so thanks for that. While I'm no computer whizz, couldn't there be multiple 'engines' as it were, involved in answering queries, thus there would be one overseeing the entire process, and a separate ones which specialise in specific topics. This would also allow for a factchecker... $\endgroup$ Commented Aug 28, 2023 at 13:35
  • $\begingroup$ @bensvensson Yes that might be a valid architecture moving forward, and I touch on that in the penultimate paragraph. $\endgroup$ Commented Aug 28, 2023 at 13:36

Not the answer you're looking for? Browse other questions tagged .