Software

AI + ML

OpenAI sued, again, for scraping and replicating news stories

The Intercept, Raw Story, AlterNet want damages and to have their content removed from models


Three digital publishers have sued OpenAI over claims that it stole their copyrighted articles to train ChatGPT in two separate lawsuits filed on Wednesday.

ChatGPT was trained on huge swathes of text scraped from the internet, including lots of journalism. News publishers, however, aren't happy that OpenAI used their articles to train its models without permission or compensation, and the New York Times has already sued OpenAI over the issue.

The Intercept, Raw Story, AlterNet are the latest media organizations to sue OpenAI for copyright infringement. The Intercept filed one case, and as Raw Story and AlterNet are owned by the same entity it filed the other. The same law firm, Loevy & Loevy, is running both cases.

The Intercept has also gone after Microsoft, which backs OpenAI and uses the super lab's technology, in its case.

Both lawsuits accuse the defendants of copyright infringement and violating the Digital Millennium Copyright Act, which prohibits removing the names of authors and titles of their work to hide IP theft.

"When they populated their training sets with works of journalism, Defendants had a choice: they could train ChatGPT using works of journalism with the copyright management information protected by the DMCA intact, or they could strip it away," the court documents in the case initiated by Raw Story and AltNet state[PDF].

"Defendants chose the latter, and in the process, trained ChatGPT not to acknowledge or respect copyright, not to notify ChatGPT users when the responses they received were protected by journalists' copyrights, and not to provide attribution when using the works of human journalists."

Similar DMCA violation claims, made by writers in a previous lawsuit against OpenAI, have not succeeded.

Attorneys representing The Intercept, Raw Story, AlterNet said it's not clear which text OpenAI and Microsoft use to train their models, but pointed to three datasets - WebText, WebText2, and Common Crawl - that they believe to include the plaintiffs’ content. The lawyers believe that articles from all three publishers have been scraped and argued that ChatGPT generates content that mimics "significant amounts" of copyrighted journalistic materials "at least some of the time."

"Based on the publicly available information described above, thousands of Plaintiffs' copyrighted works were included in Defendants' training sets without the author, title, and copyright information that Plaintiffs conveyed in publishing them," court documents [PDF] from The Intercept's legal team state.

Both plaintiffs are seeking damages and an injunction forcing the AI chatbot developers to remove all copies of their copyrighted works. They also want Judges in the Southern District of Court of New York to allow a jury trial.

The Register has asked OpenAI and Microsoft for comment. ®

Send us news
17 Comments

OpenAI develops AI model to critique its AI models

When your chatbots outshine their human trainers, you could pay for expertise ... or just augment your crowdsourced workforce

OpenAI, Google ink deals to augment AI efforts with news – it was Time for better sources

Tech giants can't play the RAG-time blues until they pay their dues – in this case to quality publishers

Breaking the rules is in Big Tech's blood – now it's time to break the habit

Microsoft: All your data are belong to us? World: That's so last century

So much for green Google ... Emissions up 48% since 2019

AI datacenters blamed for the increase, even as Chocolate Factory bets on AI to fix it

Microsoft CEO of AI: Your online content is 'freeware' fodder for training models

Unless you've got a lawyer, that is

AMD buys developer Silo AI in bid to match Nvidia's product range

First it comes for market leader's GPUs ... now it's nibbling at software

ChatGPT wrongly insists Trump-Biden CNN debate had 1 to 2-minute delay

Yes, Joe totally had editing help that night, he was certainly impresszzz....

OpenAI to pull plug on 'unsupported' nations – cough, China – from July 9

It’s not entirely clear what actions the ChatGPT maker plans to take, if any

Microsoft exits OpenAI's boardroom to sidestep regulatory scrutiny

Redmond 'confident in the company's direction' says withdrawal letter

Cloudflare debuts one-click nuke of web-scraping AI

Take that for ignoring robots.txt!

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

A few devs versus the powerful forces of Redmond – who did you think was going to win?

A friendly guide to local AI image gen with Stable Diffusion and Automatic1111

A picture is worth a 1,000 words... or was that a 1,000 TOPS