Why is ChatGPT bad at math?

Question

As opposed to How does ChatGPT know math?, I've been seeing some things floating around the Twitterverse about how ChatGPT can actually be very bad at math. For instance, I asked it "If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?"

The conversation went like this:

Mithical: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?
ChatGPT: If it takes 5 machines 5 minutes to make 5 devices, then it would take 100 machines 100 minutes to make 100 devices.
Mithical: That's not right. Check carefully and try again.
ChatGPT: I apologize if my response was not correct. Can you please provide more information or clarify your question? I may have misunderstood the scenario you described.
Mithical: It takes a machine 5 minutes to make a device. How long would it take 100 machines to make 100 devices?
ChatGPT: If it takes a single machine 5 minutes to make a single device, then it would take 100 machines 5 minutes to make 100 devices. Therefore, it would take a total of 500 minutes for 100 machines to make 100 devices.

Now, unless I'm misunderstanding something here, the answer is 5 minutes. ChatGPT first makes the intuitive mistake of 100, that a human might make as well, and then goes on to (correctly, as far as I understand) say it's 5 minutes... but concludes in the same response that it's then 500 minutes.

Why is this AI so bad at math? Computers are generally supposed to be good at math. Why does this model make such simple logical mistakes?

computers are good at maths because the rules of mathematics are hard coded into them. what you're asking here is why is a learned model bad at maths -- this is an apples to oranges comparison. just because the model runs on a computer, does not mean that it is comparable to the computers you refer to which are good at maths (presumably things like programming languages where, as I say, maths is hard coded into them) — David, Commented Dec 9, 2022 at 9:49
Computers' ALUs (arithmethic logic units) are extremely good and fast at arithmetic operations. AI models are not ALU. They are good at guessing, but they cannot perform standard computer algorithms, and are not guaranteed to give correct results. — LatinSuD, Commented Dec 9, 2022 at 15:09
This is generally wrong. Computers do not do math. They do calculations and execute algorithms. Math is making theoretical abstract statements and proving them. Logic and automatic proof systems do a part of this if presented strictly encoded input, but again this is the computer following an algorithm, there is no innate understanding. — Lutz Lehmann, Commented Dec 17, 2022 at 10:15
I found that it can find sine of some irrational number with very goot precision of up to 10 digits and more if requested, but cannot correctly multiply two 4-digit numbers... This is weird. — Anixx, Commented Dec 26, 2022 at 14:32
@LatinSuD The issue isn't that AI models aren't ALUs; this is irrelevant, IMO. A human isn't an ALU either, and a human can solve this problem easily, while an ALU has no clue how to solve this problem. The issue is that the AI model cannot internalize the problem in a way that makes it easily solvable, for example by translating it to a system of equations. — HelloGoodbye, Commented Jan 5, 2023 at 20:01

gctwnl · Accepted Answer · 2022-12-22 01:52:51Z

27

chatGPT is able to create well-formed sentences which contain phrases that are fitting for the input. It has rules extracted from its data, but those are not rules of understanding, but rules of 'likely phrases'.

In the most simple variant, every smartphone already has that mechanism. You type on your screen and the phone gives you three words that you would statistically use most often after the previous one. On the phone that is not of a very high quality. Choose the most likely words for a message and the sentence that results is something like (just started and chose a stream of 'middle options'):

"I am going on a trip with my friends to get to the full house for the weekend so we will need a time for a quick dinner with us at all and we can go over and there for the weekend if we want you guys out there"

You can see how many words have a bit of a statistical likelihood relation backwards time for, for a, time -> quick, quick dinner, with us, with -> at all.

chatGPT is fundamentally the same but much better than what the phone can do. Its statistics are much 'deeper'. It is the phone's autocompletion on steroids. But 'well-formed' and 'fitting' do not equate 'reliability' nor 'trustworthiness'. More data and training will not help, it will become even more 'well-formed' and 'fitting', but not more reliable.

'Well-formed' and 'fitting' do suggest to humans reliability and intelligence strongly, but that is more because we humans judge quickly and superficially. It's not so much that chatGPT is intelligent, but more that we are less so that we think.

What is noteworthy is that many humans make the same 5/5/5 100/100/100 error. The difference is that you can explain to humans where their quick-and-dirty answer goes wrong. That is because they do have that understanding that chatGPT has not.

answered Dec 22, 2022 at 1:52

gctwnl

3712 silver badges3 bronze badges

3

$\begingroup$ This isn't entirely true. ChatGPT can generate correct C code that compiles and runs, based on my questions. If it makes a mistake, I merely point out that a mistake was made by saying something abstract like "But then I have to know the type ahead of time," and it finds the mistake on it's own, and offers a correction that will compile. It makes deductions that are far beyond "make plausible English sentences." $\endgroup$
– SO_fix_the_vote_sorting_bug
Commented Jan 10, 2023 at 7:19
5

$\begingroup$ @SO_fix_the_vote_sorting_bug I believe that's because to ChatGPT, a C language question is still just a "language learning" question. It's still a souped up phone prediction algorithm, just one that has seen a lot of C code and knows "what comes after this". Incidentally, ChatGPT is banned from Stack Overflow precisely because its answers often look superficially correct, but can be fundamentally wrong, because it doesn't actually have an understanding. Or a compiler to check itself with. $\endgroup$
– JamieB
Commented Mar 21, 2023 at 18:59
$\begingroup$ What is the 5/5/5 100/100/100 error $\endgroup$
– Janac Meena
Commented Jun 7, 2023 at 17:31
1

$\begingroup$ @SO_fix_the_vote_sorting_bug In my experience, this is not true in general. Language models generate plausible code that is sometimes correct, sometimes wrong in subtle ways, and sometimes ridiculously wrong. I've had ChatGPT make blatant errors, and when corrected, apologize and confidently give a new answer, which is also totally wrong. $\endgroup$
– LarsH
Commented Jun 19, 2023 at 10:49
$\begingroup$ Generally this is a good answer, but not "It's not so much that chatGPT is intelligent, but more that we are less so that we think." Fitting answers suggest intelligence to humans because they are strongly correlated in human speech and writing. We've not had experience before with entities that could could produce fitting output without intelligence or understanding. $\endgroup$
– LarsH
Commented Jun 19, 2023 at 10:52

Add a comment |

Robin van Hoorn · Accepted Answer · 2022-12-18 12:21:39Z

(Check out my heavily related answer to a similar question here)

Why is ChatGPT bad at math, while it is very good at other stuff?

The problem comes down to the age-old problem of learning vs understanding. On a high level, your question is very philosophical. The problem is that the model learns everything present in the data. However, throw it a slightly different mathematical problem it has not seen before, and it fails to answer spectacularly easy questions.

One of the best examples are papers that try to make mathematics AIs. They throw it an endless amount of data for simple addition, but throw it a slightly higher number that it has never seen before, and it cannot solve it.

These AIs learn their data well, but they cannot extrapolate strict 'rules' from the data. Addition is a simple 'rule', but it cannot comprehend this rule. Neural networks cannot learn rules simply because they are not designed to do so. They are multiplying and adding up inputs, weights and biases, and they are not designed to learn 'if, else' logic.

An additional train of thought which i very much like is the following: Think of how precise a super small multi-layer perception would have to be to implement simple 2 digit addition (given 4 inputs). Should be doable right! Now think of how you would have to learn the weights of a model with 1.000.000 parameters to do the exact same, using a large dataset and a NN optimizer. It would just start learning how to approximate the answers, but it will never converge to a state where it learns the rule of addition.

Boris · Accepted Answer · 2023-01-06 15:49:39Z

ChatGPT is good at math and can understand the logic. It can derive new conclusions on its own and generate value which was not there before. To make use of ChatGPT you have to provide it with the right context and questions. This applies to math but also many other areas.

I started with the same task, 5 machines/5 minutes/5 devices. At first, it was able to provide correct answer. Then I changed '5' to '3' (3 machines/3 minutes/3 devices) and it failed. But then I provided it with the context, asked a question differently, and it was able to understand the problem, and even create a general rule for solving it.

Here is how (link to the chat)...

Initial question, answered correctly. Most likely, in the meantime, this was added to its training data. The original question was posted like a month ago.

B: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: If it takes 5 machines 5 minutes to make 5 devices, it would take 1 machine 5 minutes to make 1 device.

If it takes 1 machine 5 minutes to make 1 device, it would take 100 machines 5 minutes to make 100 devices.

Then I changed the question in a trivial way (3 instead of 5). At first it failed to get it:

B: If it takes 3 machines 3 minutes to make 3 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: It is not possible to determine how long it would take 100 machines to make 100 devices based on the information provided. The relationship between the number of machines and the time it takes to produce a certain number of devices is not necessarily linear. In order to determine how long it would take 100 machines to make 100 devices, we would need to know the efficiency of the machines and how they work together.

But then:

B: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: If it takes 5 machines 5 minutes to make 5 devices, it would take 1 machine 5 minutes to make 1 device.

If it takes 1 machine 5 minutes to make 1 device, it would take 100 machines 5 minutes to make 100 devices.

B: using the same logic as in the previous answer please analyze similar problem and provide the answer.

If it takes 3 machines 3 minutes to make 3 devices, how long would it take 100 machines to make 100 devices?

ChatGPT: If it takes 3 machines 3 minutes to make 3 devices, it would take 1 machine 3 minutes to make 1 device.

If it takes 1 machine 3 minutes to make 1 device, it would take 100 machines 3 minutes to make 100 devices.

Then I went further and was able to get more from ChatGPT, it was able to provide new value in the form of general rule for the class of problems. Many humans would cut short on this step ;D ...

B: using the same logic, please formulate general rule to solve this type of a problem

ChatGPT: Here is a general rule that can be used to solve this type of problem:

Let X be the number of machines, Y be the number of minutes it takes for X machines to make X devices, and Z be the number of devices that need to be made.

To determine the number of minutes it would take Y machines to make Z devices, divide Z by X and multiply the result by Y.

For example, if X = 5, Y = 5, and Z = 100, the number of minutes it would take 5 machines to make 100 devices can be calculated as follows: (100/5) * 5 = 20 minutes.

This rule can be expressed in the following formula:

(Z/X) * Y = number of minutes it would take Y machines to make Z devices

I hope this helps! Let me know if you have any questions or if you would like further clarification on this rule.

Its final formula is correct, "(Z/X) * Y = number of minutes it would take Y machines to make Z devices".

But no one is perfect, it has 2 small mistakes there.

1st, "To determine the number of minutes it would take Y machines to make Z devices, divide Z by X and multiply the result by Y." It should have used another letter, like W, to avoid confusion.

2nd, example " (100/5) * 5 = 20 minutes." is just wrong, result is 100 minutes.

Nevertheless, those small mistakes dont make any difference. It can now solve advance varaint of the question (chellange any human with this :D )

B: If it takes 3 machines 3 minutes to make 3 devices, how long would it take 7 machines to make 110 devices?

ChatGPT: If it takes 3 machines 3 minutes to make 3 devices, it would take 1 machine 3 minutes to make 1 device.

If it takes 1 machine 3 minutes to make 1 device, it would take 7 machines 3 minutes to make 7 devices.

To determine the number of minutes it would take 7 machines to make 110 devices, we can use the formula (Z/X) * Y = number of minutes it would take Y machines to make Z devices, where X is the number of machines, Y is the number of minutes it takes for X machines to make X devices, and Z is the number of devices that need to be made. Plugging in the values, we get:

(110/7) * 3 = 45 minutes

Therefore, it would take 7 machines 45 minutes to make 110 devices using the production system described.

Again, not perfect, it rounded up 110/7 to 15. Should keep it at 15.7 exactly. But ok, still makes sense. You can give it further instructions, e.g. to not round calculations for this type of problem and it will comply going forward.

My take on this: we should all learn how to interact with this thing, its gonna be very important in the near future!

You need to repeat these experiments a lot. Remember what you are getting are samples of what the model predicts the next bit of text will be. It does a kind of maths up to some limit in order to predict the next text because in order to represent many similar problem-and-answer examples in its training data it discovered a procedural thing it could perform over timesteps that allowed better probability of getting the right text. But it also learned from examples like your second one that sometimes this is not possible. The output is stochastic, and 3 examples is not nearly enough — Neil Slater, Commented Jan 6, 2023 at 14:32
Based on my personal experience with ChatGTP plus, this is not the issue. ChatGTP told me that 2x0.85 = 2.55. Then when I corrected it, ChatGTP recalculated it to 1.7, while failed to calculate (1000 + 10) / (2 x 0.85 ) and provided result of 588.24 while the real result should be 594.117647. So as for now, I consider ChatGTP not suitable for any calculations. — Ernest Sawyer, Commented Mar 5, 2023 at 8:50
The optimism in this answer is brightening and reminds me of the early enthusiasm that welcomed the Eliza chatbot in the sixties. If you really want to believe that ChatGPT can do math, then sure, ChatGPT can do math. It can also write your horoscope. — Stef, Commented Mar 31, 2023 at 15:22
"ChatGPT is good at math and can understand the logic." This assertion is not supported by the data. You might as well say that a textbook on calculus can understand integrals. — LarsH, Commented Jun 19, 2023 at 14:22

LeRobert · Accepted Answer · 2023-05-18 08:54:21Z

ChatGPT's GPT-4 model does not fall for this trap anymore due to more extensive training.

I tested with two prompts:

Prompt: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 50 machines to make 50 devices?

GPT-4: This is an example of a classic problem that's meant to test understanding of rates. The key to solving it is recognizing that each machine can produce one device in five minutes.

Given that, if you have 50 machines working simultaneously, they can produce 50 devices in 5 minutes. This is because each machine is working independently to produce its own device, so the overall rate of production increases linearly with the number of machines. Thus, the answer is 5 minutes.

Prompt: And if it takes 3 machines 3 minutes to make 3 devices, how long would it take 70 machines to make 70 devices?

GPT-4: The principle is the same in this case. Each machine can produce one device in three minutes. So, if you have 70 machines working simultaneously, they can produce 70 devices in 3 minutes. This is because each machine is working independently to produce its own device, so the overall rate of production increases linearly with the number of machines. Thus, the answer is 3 minutes.

As I explained at the question How does ChatGPT know math?, the apparent ability of LLM models (in particular the most capable GPT-4 model) to solve certain mathematical problems is due to the amount of training and the amount of parameters. These ML models do not have explicit rules for solving mathematical problems.

The following 2022 paper describes that such capabilities of transformer-based language models occur (i.e. emerge) when a certain threshold of parameter and training (FLOPs) quantity is exceeded: https://arxiv.org/pdf/2206.07682.pdf

Welcome to AI-SE. I'm not seeing how this late answer contributes to what was already stated in the other answers. Although GPT-4 might not fall for the same mistake, it is not the intent of the question, which is much more high-level. — Robin van Hoorn, Commented May 18, 2023 at 9:41
@RobinvanHoorn The question was "Why is this AI (ChatGPT) so bad at math?" I added the current response of GPT-4, which shows progress with the specified case, while explaining that LLMs don't have mathematical reasoning rules, they were not built for such problems. What is very interesting from the article that I referenced is the fact that the LLM approach, given enough parameters, can develop even the ability to imitate mathematical reasoning in many cases. — LeRobert, Commented May 19, 2023 at 15:41

Stack Exchange Network

Why is ChatGPT bad at math?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
natural-language-processing
math
chat-bots
gpt
chatgpt
.

Linked

Hot Network Questions

Why is ChatGPT bad at math?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged natural-language-processingmathchat-botsgptchatgpt.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
natural-language-processing
math
chat-bots
gpt
chatgpt
.