CriticGPT: OpenAI wants to find bugs in ChatGPT with critical GPT-4 version

OpenAI has presented an AI model that is designed to find errors in code that ChatGPT outputs. Both CriticGPT and ChatGPT are based on GPT-4.

Save to Pocket listen Print view
ChatGPT app on a smartphone

(Image: Tada Images/Shutterstock.com)

3 min. read
This article was originally published in German and has been automatically translated.

CriticGPT is designed to monitor ChatGPT and find errors. Both applications are based on GPT-4. According to OpenAI, people who use both for coding perform 60 percent better than people who only use ChatGPT. The critical function is to be incorporated into the chatbot as part of reinforcement learning from human feedback (RLHF).

RLHF stands for reinforcement learning. So far, this method has involved people providing feedback to the chatbot, which in turn flows into the model. It is a kind of feedback loop in which the AI, in very humanized terms, is looking for positive feedback, i.e. wants to be praised, and therefore gets better. However, people give very different feedback, and positive feedback does not necessarily have to be correct. Now, instead of humans, AI-based feedback is to move into ChatGPT as a loop: CriticGPT. OpenAI hopes that this will result in better and more consistent feedback. However, CriticGPT also asks humans to confirm their own error assessment.

Initially, CriticGPT is only intended for the code area. However, OpenAI wants to extend the application to other skills. Even with CriticGPT, ChatGPT's suggestions are not always correct – as shown by the 60 percent improvement in error avoidance that OpenAI talks about. "With the progress we are making in reasoning and model behavior, ChatGPT's errors are becoming fewer and fewer and more and more subtle," says the OpenAI blog post. This makes it increasingly difficult for purely human feedback providers to detect errors. OpenAI believes that AI models are "gradually gaining more knowledge" than any person.

OpenAI also describes the limitations of CriticGPT in the blog post. For example, the model has only been trained with relatively short answers from ChatGPT; longer answers still have to follow. It also clearly states that AI models hallucinate, which can still lead to errors, despite CriticGPT and evaluated answers. Errors from the real world can be transferred to the model; errors recognized by CriticGPT are of a different nature, they only focus on individual sources of error that can be separated. Some answers are also so complex that no expert model can evaluate them correctly.

There have long been attempts to have AI models check each other to detect hallucinations and confabulations. Incorrect output in terms of content is a major problem with current AI chatbots. Meta's head of research Naila Murray, for example, says that this is one of the reasons why large language models cannot be used in critical areas such as creditworthiness or the justice system.

(emw)