Skip to main content

OpenAI announces new free model GPT-4o and ChatGPT for desktop

Screenshot of Mira Murati presenting at OpenAI's Spring Updates 2024 event.
Credit: OpenAI on YouTube/screenshot by author

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Today at OpenAI’s Spring Updates event, chief technology officer Mira Murati announced a powerful new multimodal foundation large language model (LLM), GPT-4o (short for GPT-4 Omni), which will be made available to all free ChatGPT users in the coming weeks, and a ChatGPT desktop app for MacOS (later for Windows) that will allow users access outside the web and mobile apps.

“GPT-4o reasons across voice, text, and vision,” Murati said. That includes accepting and analyzing realtime video captured by users on their ChatGPT smartphone apps, though this capability is not yet publicly available.

“This just feels so magical, and that’s wonderful, but we want to remove some of the mysticism and allow you to try it out for yourself,” OpenAI’s CTO added said.

The new model responds in realtime audio, can detect a user’s emotional state from audio and video, and can adjust its voice to convey different emotions, similar to rival AI startup Hume.

One demo during the presentation involved a presenter asking ChatGPT on their phone, powered by GPT-4o, to tell a story in a more and more dramatic and theatrical sounding voice, which did it correctly and quickly. It also stops talking when interrupted and listens to the user before continuing.

OpenAI posted demo videos and examples of GPT-4o’s capabilities on its website here, noting: “It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.”

The company explained how GPT-4o differs from prior its models and how this enables new experiences:

“Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

It can even be used to generate multiple views of a single image that can, in turn, be transformed into 3D objects:

However, OpenAI did not state that it would open source GPT-4o nor any of its newer AI models — meaning, while users can try the new foundation model and its capabilities on OpenAI’s website and through its apps and application programming interface (API), they won’t have full access to the underlying weights for customizing the model to their own liking, something critics including co-founder turned rival Elon Musk have pointed to as an example of OpenAI straying from its foundational mission.

A new model brings more power and capabilities to free ChatGPT users

The features offered by GPT-4o stand to be a significant upgrade to the current experience for ChatGPT free users, who were until now stuck on the text-only GPT-3.5 model, lacking the powerful capabilities of GPT-4 to analyze images and documents uploaded by users.

Now, free ChatGPT users will have access to a significantly more intelligent model, web browsing, data analysis and chart creation, access to the GPT Store to use custom GPTs created by other third parties, and even memory so the chatbot can store information the user wants about them and their preferences simply by typing or asking it audibly.

In a demo during the event, OpenAI presenters showed how ChatGPT powered by GPT-4o could be used as a realtime translator app, automatically listening to and translating a speaker’s words from Italian to English and vice versa.

In a blog post announcing GPT-4o, OpenAI noted that: “ChatGPT also now supports more than 50 languages across sign-up and login, user settings, and more.”

In addition, OpenAI wrote: “GPT-4o is much better than any existing model at understanding and discussing the images you share.”

Furthermore, it can be used to create consistent AI art characters, something that has eluded most AI art generators to date.

OpenAI also noted that while it would eventually be available to free ChatGPT users, GPT-4o would first roll out to paying subscribers:

We are beginning to roll out GPT-4o to ChatGPT Plus and Team users, with availability for Enterprise users coming soon. We are also starting to roll out to ChatGPT Free with usage limits today. Plus users will have a message limit that is up to 5x greater than free users, and Team and Enterprise users will have even higher limits.

On X, OpenAI’s company account posed that while “text and image input” are rolling out today in OpenAI’s API, the voice and video capabilities will be available in “the coming weeks.”

In the API, GPT-4o will be available at half the price and 2x the speed of GPT-4 Turbo along with 5x increased rate limits — the amount of calls third-party developers can make in any given time — according to OpenAI co-founder and CEO Sam Altman’s posts on X during the event.

On X, OpenAI researcher William Fedus confirmed that the mysterious “gpt2-chatbot” that was spotted by users on the LMSys arena online was indeed GPT-4o in disguise.

Posting over on his personal blog, Altman wrote that OpenAI’s mindset about building AI had changed: “Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world. Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.”

Read the full blog post here and below:

“GPT-4o
There are two things from our announcement today I wanted to highlight.

First, a key part of our mission is to put very capable AI tools in the hands of people for free (or at a great price). I am very proud that we’ve the best model in the world available for free in ChatGPT, without ads or anything like that.

Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world. Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.

We are a business and will find plenty of things to charge for, and that will help us provide free, outstanding AI service to (hopefully) billions of people.

Second, the new voice (and video) mode is the best compute interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change

The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.

Talking to a computer has never felt really natural for me; now it does. As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before.

Finally, huge thanks to the team that poured so much work into making this happen!

Desktop ChatGPT app for macOS first, Windows later this year

In its blog post, OpenAI stated that the new ChatGPT desktop app would be a staggered release for macOS first and Windows at some undetermined point before the end of the year.

“We’re rolling out the macOS app to Plus users starting today, and we will make it more broadly available in the coming weeks. We also plan to launch a Windows version later this year.”

One interesting note about the desktop app: it will allow ChatGPT to see a live video screen capture of your screen (if you so choose) and analyze your workflow:

Murati said during the event that more than 100 million people are already using ChatGPT and more than 1 million custom GPTs have been created by users in the GPT Store.

The event concluded after just 26 minutes, short by tech standards, and the live demos were riddled with some awkward moments of presenters interrupting ChatGPT’s voice responses to redirect it or correct it from mistakenly analyzing things that they did not ask.

Still, with the technology coming soon to users, it will be interesting to see how it is embraced and if people view it as meaningfully different and offering a better, more powerful and capable or naturalistic experience than GPT-4 Turbo or ChatGPT’s most recent prior versions.