Skip to main content

‘Not an imitation’: OpenAI pauses ChatGPT voice that sounded like Scarlett Johansson

VentureBeat/Ideogram
VentureBeat/Ideogram

We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More


Just days after showing off the upgraded avatar of ChatGPT that could listen and respond like real humans, answering in mere milliseconds, OpenAI is taking a step back by pausing the AI assistant’s much-talked-about “Sky” voice.

The company said it is pulling the plug on Sky amid concerns that it sounded too much like actor Scarlett Johansson from the movie “Her,” in which she portrays an AI operating system that becomes the lead character’s girlfriend.

The company explicitly denied the rumors and said the voice is of a “different professional actress using her own natural speaking voice.”

For now, it remains unclear when or if the company will resume the voice option of the assistant. The other four voice options – Breeze, Cove, Ember and Juniper – continue to be available for ChatGPT users.


Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now


Following OpenAI’s blog post and this article, journalist Yashar Ali posted a message on X he said was confirmed to be from Johansson’s representative, in which Johansson stated she had been approached by OpenAI CEO Sam Altman prior to the announcement of ChatGPT’s new voices and model GPT-4o a week ago, asked to lend her voice to the project, and declined.

Nonetheless, Johansson stated: “When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference.” Read the full statement on Ali’s X account and below.

OpenAI’s ChatGPT or ‘Samantha’ from Her?

OpenAI launched voice capabilities in ChatGPT in September 2023. The feature worked well but had some level of latency as it worked on top of three different models –  one to transcribe audio-to-text, GPT-3.5/GPT-4 to take in that text and produce a response and the third one to convert that text back to audio. This meant the model providing the answer couldn’t directly observe tone, multiple speakers or background noises and couldn’t even respond with laughter or other emotions.

To change this, last week, OpenAI announced GPT-4o, a unified multimodal AI that reasons across text, voice and vision in real-time with GPT-4 level intelligence. The company released several demos showcasing how the model can enable ChatGPT to respond in about 320 milliseconds, matching the response time of humans, and serve as a personal assistant – taking on the likes of Siri and Alexa.

The news made major headlines, but soon after the video demos appeared, many users started pointing out that the Sky voice in the new Voice Mode of GPT-4o sounded too much like the personal assistant “Samantha” shown in the movie Her, voiced by Scarlett Johansson. The rumors picked up even more when Sam Altman posted the word “her” on X soon after the new voice mode was revealed publicly. There’s an entire Reddit thread discussing how the voice of GPT-4o matches that of Scarlett Johansson in “Her”, down to tone, giggles and laughs. 

Naturally, with comparisons with Johansson, many started wondering how OpenAI could create a voice with so much similarity, including guesses that the company may have used AI to replicate the tone and speaking style of the actress.

Now, in response, OpenAI has chosen to hit the pause button on the Sky voice as it works to address the concerns and questions around it. The company clarified that the voice in question is not an imitation of Johansson but that of a paid voice actress selected through an extensive recruiting process spanning five months.

“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents,” the company wrote in a blog post.

It added that the five voice actors, including the one behind Sky, were chosen from a pool of 400 applicants with the help of talent agencies, casting directors and industry advisors. These groups first came up with a set of criteria for ChatGPT’s voice – covering aspects like diversity, timelessness, approachability and warmth – and then used these parameters to sift through the applications and hand-pick the most suitable options. The finalized actors were then called to OpenAI HQ for recording sessions, leading to the launch.

“We spoke with each actor about the vision for human-AI voice interactions and OpenAI, and discussed the technology’s capabilities, limitations, and the risks involved, as well as the safeguards we have implemented. It was important to us that each actor understood the scope and intentions of Voice Mode before committing to the project,” the company said while adding that each actor is paid top-of-the-market rates for lending their voice to OpenAI.

What happens to ChatGPT’s voice now?

While the voice of Sky is paused, the other four remain available. With GPT-4o’s new Voice Mode coming to ChatGPT in the coming weeks, users can choose any of these voices to interact with the AI.

As for Sky, it remains unclear what changes the company plans to implement to put an end to talks of replication from the movie Her and similarities with Johansson. It also remains unclear what this would mean for the unnamed actor behind the voice.

The company has only said it continues to collaborate with the actors who have contributed additional work for audio research and new voice capabilities in GPT-4o and that it will further expand the assistant with more voices launching in the future to better match users’ diverse interests and preferences.