OpenAI breach is a reminder that AI companies are treasure troves for hackers

12:49 PM PDT • July 5, 2024

Image Credits: Bryce Durbin / TechCrunch

There’s no need to worry that your secret ChatGPT conversations were obtained in a recently reported breach of OpenAI’s systems. The hack itself, while troubling, appears to have been superficial — but it’s a reminder that AI companies have in short order made themselves into one of the juiciest targets out there for hackers.

The New York Times reported the hack in more detail after former OpenAI employee Leopold Aschenbrenner hinted at it recently in a podcast. He called it a “major security incident,” but unnamed company sources told the Times the hacker only got access to an employee discussion forum. (I reached out to OpenAI for confirmation and comment.)

No security breach should really be treated as trivial, and eavesdropping on internal OpenAI development talk certainly has its value. But it’s far from a hacker getting access to internal systems, models in progress, secret roadmaps, and so on.

But it should scare us anyway, and not necessarily because of the threat of China or other adversaries overtaking us in the AI arms race. The simple fact is that these AI companies have become gatekeepers to a tremendous amount of very valuable data.

Let’s talk about three kinds of data OpenAI and, to a lesser extent, other AI companies created or have access to: high-quality training data, bulk user interactions, and customer data.

It’s uncertain what training data exactly they have, because the companies are incredibly secretive about their hoards. But it’s a mistake to think they are just big piles of scraped web data. Yes, they do use web scrapers or datasets like the Pile, but it’s a gargantuan task shaping that raw data into something that can be used to train a model like GPT-4o. A huge amount of human work hours are required to do this — it can only be partially automated.

AI training data has a price tag that only Big Tech can afford

Some machine learning engineers have speculated that of all the factors going into the creation of a large language model (or, perhaps, any transformer-based system), the single most important one is dataset quality. That’s why a model trained on Twitter and Reddit will never be as eloquent as one trained on every published work of the last century. (And probably why OpenAI reportedly used questionably legal sources like copyrighted books in their training data, a practice they claim to have given up.)

So the training datasets OpenAI has built are of tremendous value to competitors, from other companies to adversary states to regulators here in the U.S. Wouldn’t the Federal Trade Commission (FTC) or courts like to know exactly what data was being used, and whether OpenAI has been truthful about that?

But perhaps even more valuable is OpenAI’s enormous trove of user data — probably billions of conversations with ChatGPT on hundreds of thousands of topics. Just as search data was once the key to understanding the collective psyche of the web, ChatGPT has its finger on the pulse of a population that may not be as broad as the universe of Google users, but provides far more depth. (In case you weren’t aware, unless you opt out, your conversations are being used for training data.)

AI-powered scams and what you can do about them

In the case of Google, an uptick in searches for “air conditioners” tells you the market is heating up a bit. But those users don’t then have a whole conversation about what they want, how much money they’re willing to spend, what their home is like, manufacturers they want to avoid, and so on. You know this is valuable because Google is itself trying to convert its users to provide this very information by substituting AI interactions for searches!

Think of how many conversations people have had with ChatGPT, and how useful that information is, not just to developers of AIs, but also to marketing teams, consultants, analysts … It’s a gold mine.

The last category of data is perhaps of the highest value on the open market: how customers are actually using AI, and the data they have themselves fed to the models.

Hundreds of major companies and countless smaller ones use tools like OpenAI and Anthropic’s APIs for an equally large variety of tasks. And in order for a language model to be useful to them, it usually must be fine-tuned on or otherwise given access to their own internal databases.

This might be something as prosaic as old budget sheets or personnel records (e.g., to make them more easily searchable) or as valuable as code for an unreleased piece of software. What they do with the AI’s capabilities (and whether they’re actually useful) is their business, but the simple fact is that the AI provider has privileged access, just as any other SaaS product does.

These are industrial secrets, and AI companies are suddenly right at the heart of a great deal of them. The newness of this side of the industry carries with it a special risk in that AI processes are simply not yet standardized or fully understood.

Hugging Face says it detected ‘unauthorized access’ to its AI model hosting platform

Like any SaaS provider, AI companies are perfectly capable of providing industry standard levels of security, privacy, on-premises options, and generally speaking providing their service responsibly. I have no doubt that the private databases and API calls of OpenAI’s Fortune 500 customers are locked down very tightly! They must certainly be as aware or more of the risks inherent in handling confidential data in the context of AI. (The fact that OpenAI did not report this attack is their choice to make, but it doesn’t inspire trust for a company that desperately needs it.)

But good security practices don’t change the value of what they are meant to protect, or the fact that malicious actors and sundry adversaries are clawing at the door to get in. Security isn’t just picking the right settings or keeping your software updated — though of course the basics are important too. It’s a never-ending cat-and-mouse game that is, ironically, now being supercharged by AI itself: Agents and attack automators are probing every nook and cranny of these companies’ attack surfaces.

There’s no reason to panic — companies with access to lots of personal or commercially valuable data have faced and managed similar risks for years. But AI companies represent a newer, younger, and potentially juicier target than your garden-variety, poorly configured enterprise server or irresponsible data broker. Even a hack like the one reported above, with no serious exfiltrations that we know of, should worry anybody who does business with AI companies. They’ve painted the targets on their backs. Don’t be surprised when anyone, or everyone, takes a shot.

AI aids nation-state hackers but also helps US spies to find them, says NSA cyber director

More TechCrunch

OpenAI Startup Fund backs AI healthcare venture with Arianna Huffington

Kyle Wiggers

2 hours ago

Huffington Post founder Arianna Huffington and OpenAI CEO Sam Altman are throwing their weight behind a new venture, Thrive AI Health, that aims to build AI-powered assistant tech to promote…

OpenAI Startup Fund backs AI healthcare venture with Arianna Huffington

Data workers detail exploitation by tech industry in DAIR report

Devin Coldewey

4 hours ago

The essential labor of data work, like moderation and annotation, is systematically hidden from those who benefit from the fruits of that labor. A new project puts the lived experiences…

Data workers detail exploitation by tech industry in DAIR report

Space

TechCrunch Space: SpaceX’s big plans for Starship in Florida

Aria Alamalhodaei

4 hours ago

Hello and welcome back to TechCrunch Space. I hope everyone had a great Independence Day. On to the news!

TechCrunch Space: SpaceX’s big plans for Starship in Florida

Featured Article

Valuations of startups have quietly rebounded to all-time highs. Some investors say the slump is over.

Generative AI businesses aside, the last couple of years have been relatively difficult for venture-backed companies. Very few startups were able to raise funding at prices that exceeded their previous valuations. Now, approximately two years after the venture slump began in early 2022, some investors, like IVP general partner Tom…

Marina Temkin

6 hours ago

Valuations of startups have quietly rebounded to all-time highs. Some investors say the slump is over.

Security

Apple removes VPN apps at request of Russian authorities, say app makers

Lorenzo Franceschi-Bicchierai

7 hours ago

VPN makers report having received a notification from Apple that their apps have been removed from the App Store in Russia.

Apple removes VPN apps at request of Russian authorities, say app makers

Space

Ariane 6 is the future of European heavy-lift launch — for better or worse

Aria Alamalhodaei

7 hours ago

Europe’s next-generation launch vehicle, the Ariane 6, is poised to lift off for the first time tomorrow, as the continent looks to build out sovereign access to space and ensure…

Ariane 6 is the future of European heavy-lift launch — for better or worse

Social

Substack rival Ghost federates its first newsletter

Sarah Perez

8 hours ago

Over the past few days, Ghost says it has achieved two major milestones in its move to become a federated service.

Hardware

Samsung Unpacked 2024: What we expect and how to watch Wednesday’s hardware event

Brian Heater

8 hours ago

The Samsung event will feature updates to the Galaxy Z Fold, Galaxy Z Flip, as well as more details on the Galaxy Ring and Galaxy AI.

Hardware

Amazon revives its Echo Spot with an upgraded look and improved audio

Aisha Malik

9 hours ago

Amazon has released an all-new version of its Echo Spot ahead of Prime Day, the company announced on Monday. The 2024 version of the Alexa-enabled smart alarm clock costs $79.99,…

Amazon revives its Echo Spot with an upgraded look and improved audio

Tembo capitalizes on the database boom and lands new cash to expand

Kyle Wiggers

10 hours ago

One of the vendors to benefit from the database boom is Tembo, a startup creating a platform that lets developers deploy different flavors of Postgres.

Tembo capitalizes on the database boom and lands new cash to expand

TechCrunch Disrupt 2024

Mayfield’s Navin Chaddha is coming to TechCrunch Disrupt 2024

TechCrunch Events

11 hours ago

TechCrunch Disrupt 2024 is set to welcome an impressive lineup of judges for the Startup Battlefield 200 competition, presented this year by Google Cloud. These judges will decide which company…

Mayfield’s Navin Chaddha is coming to TechCrunch Disrupt 2024

Apps

Art therapy app Scribble Journey lets you express emotions through doodles

Lauren Forristal

11 hours ago

Numerous concerns are weighing on the minds of many, whether it’s current global conflicts, climate change or the precarious state of the economy, it is no surprise that the world…

Apps

Pestle’s app can now save recipes from Reels using on-device AI

Sarah Perez

11 hours ago

Pestle addresses the common problem of finding recipes on the web.

Pestle’s app can now save recipes from Reels using on-device AI

Transportation

Lucid Motors sets new record for EV deliveries as it seeks ‘escape velocity’

Sean O'Kane

13 hours ago

These efforts have come as Lucid is looking to start building its Gravity SUV by the end of this year.

Lucid Motors sets new record for EV deliveries as it seeks ‘escape velocity’

Government & Policy

Delivery Hero warns it could face €400M antitrust fine

Natasha Lomas

15 hours ago

Berlin-based food delivery giant Delivery Hero has warned investors it may “ultimately” face an antitrust fine of up to €400 million. The development, reported earlier by Reuters, follows unannounced raids…

Delivery Hero warns it could face €400M antitrust fine

Featured Article

Investors chase wealth tech startups in India as affluent class grows

The high-net-worth and ultra-high-net-worth segments are booming in India, prompting some wealth management firms to aggressively expand their relationship manager networks to capture this market.

Manish Singh

1 day ago

Investors chase wealth tech startups in India as affluent class grows

Featured Article

Seed VCs are turning to new ‘pro rata’ funds that help them compete with the big firms

Three companies with new funds deploy capital to support seed and Series A VCs looking to exercise their pro rata rights.

Christine Hall

1 day ago

Seed VCs are turning to new ‘pro rata’ funds that help them compete with the big firms

Gaming

YouTube and LinkedIn have games now, and here’s how you can play them

Lauren Forristal

1 day ago

Here are the latest companies venturing into the gaming scene and details about each offering, including pricing, examples of titles and supported devices.

YouTube and LinkedIn have games now, and here’s how you can play them

Featured Article

CIOs’ concerns over generative AI echo those of the early days of cloud computing

CIOs trying to govern generative AI have the same concerns they had about cloud computing 15 years ago, but they’ve learned some things along the way.

Ron Miller

1 day ago

CIOs’ concerns over generative AI echo those of the early days of cloud computing

Government & Policy

Epic Games CEO promises to ‘fight’ Apple over ‘absurd’ changes

Anthony Ha

1 day ago

It sounds like the latest dispute between Apple and Fortnite-maker Epic Games isn’t over. Epic has been fighting Apple for years over the company’s revenue-sharing requirements in the App Store.…

Epic Games CEO promises to ‘fight’ Apple over ‘absurd’ changes

Robotics

What happens if you shoot down a delivery drone?

Brian Heater

1 day ago

As deep-pocketed companies like Amazon, Google and Walmart invest in and experiment with drone delivery, a phenomenon reflective of this modern era has emerged. Drones, carrying snacks and other sundries,…

What happens if you shoot down a delivery drone?

Transportation

Waymo robotaxi pulled over by Phoenix police after driving into the wrong lane

Anthony Ha

2 days ago

A police officer pulled over a self-driving Waymo vehicle in Phoenix after it ran a red light and pulled into a lane of oncoming traffic, according to dispatch records. The…

Waymo robotaxi pulled over by Phoenix police after driving into the wrong lane

Social

Figma pauses its new AI feature after Apple controversy

Cody Corrall

2 days ago

Welcome back to TechCrunch��s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. This week, Figma CEO Dylan…

Figma pauses its new AI feature after Apple controversy

Social

How to set up parental controls on Facebook, Snapchat, TikTok and more popular sites

Aisha Malik

2 days ago

We’ve created this guide to help parents navigate the controls offered by popular social media companies.

How to set up parental controls on Facebook, Snapchat, TikTok and more popular sites

Featured Article

You could learn a lot from a CIO with a $17B IT budget

Lori Beer’s work is a case study for every CIO out there, most of whom will never come close to JP Morgan Chase’s scale, but who can still learn from how it goes about its business.

Ron Miller

2 days ago

You could learn a lot from a CIO with a $17B IT budget

Transportation

Tesla makes it onto Chinese government purchase list

Anthony Ha

2 days ago

For the first time, Chinese government workers will be able to purchase Tesla’s Model Y for official use. Specifically, officials in eastern China’s Jiangsu province included the Model Y in…

Tesla makes it onto Chinese government purchase list

Tokens are a big reason today’s generative AI falls short

Kyle Wiggers

2 days ago

Generative AI models don’t process text the same way humans do. Understanding their “token”-based internal environments may help explain some of their strange behaviors — and stubborn limitations. Most models,…

Tokens are a big reason today’s generative AI falls short

Gaming

Apple approves Epic Games’ marketplace app after initial rejections

Sarah Perez

3 days ago

After multiple rejections, Apple has approved Fortnite maker Epic Games’ third-party app marketplace for launch in the EU. As now permitted by the EU’s Digital Markets Act (DMA), Epic announced…