OpenAI sued, again, for scraping and replicating news stories

The Intercept, Raw Story, AlterNet want damages and to have their content removed from models

Katyanna Quach Thu 29 Feb 2024 // 01:15 UTC

Three digital publishers have sued OpenAI over claims that it stole their copyrighted articles to train ChatGPT in two separate lawsuits filed on Wednesday.

ChatGPT was trained on huge swathes of text scraped from the internet, including lots of journalism. News publishers, however, aren't happy that OpenAI used their articles to train its models without permission or compensation, and the New York Times has already sued OpenAI over the issue.

The Intercept, Raw Story, AlterNet are the latest media organizations to sue OpenAI for copyright infringement. The Intercept filed one case, and as Raw Story and AlterNet are owned by the same entity it filed the other. The same law firm, Loevy & Loevy, is running both cases.

The Intercept has also gone after Microsoft, which backs OpenAI and uses the super lab's technology, in its case.

Both lawsuits accuse the defendants of copyright infringement and violating the Digital Millennium Copyright Act, which prohibits removing the names of authors and titles of their work to hide IP theft.

"When they populated their training sets with works of journalism, Defendants had a choice: they could train ChatGPT using works of journalism with the copyright management information protected by the DMCA intact, or they could strip it away," the court documents in the case initiated by Raw Story and AltNet state[PDF].

"Defendants chose the latter, and in the process, trained ChatGPT not to acknowledge or respect copyright, not to notify ChatGPT users when the responses they received were protected by journalists' copyrights, and not to provide attribution when using the works of human journalists."

Similar DMCA violation claims, made by writers in a previous lawsuit against OpenAI, have not succeeded.

Attorneys representing The Intercept, Raw Story, AlterNet said it's not clear which text OpenAI and Microsoft use to train their models, but pointed to three datasets - WebText, WebText2, and Common Crawl - that they believe to include the plaintiffs’ content. The lawyers believe that articles from all three publishers have been scraped and argued that ChatGPT generates content that mimics "significant amounts" of copyrighted journalistic materials "at least some of the time."

"Based on the publicly available information described above, thousands of Plaintiffs' copyrighted works were included in Defendants' training sets without the author, title, and copyright information that Plaintiffs conveyed in publishing them," court documents [PDF] from The Intercept's legal team state.

Both plaintiffs are seeking damages and an injunction forcing the AI chatbot developers to remove all copies of their copyrighted works. They also want Judges in the Southern District of Court of New York to allow a jury trial.

The Register has asked OpenAI and Microsoft for comment. ®

Software

AI + ML

OpenAI sued, again, for scraping and replicating news stories

The Intercept, Raw Story, AlterNet want damages and to have their content removed from models

OpenAI develops AI model to critique its AI models

OpenAI, Google ink deals to augment AI efforts with news – it was Time for better sources

Breaking the rules is in Big Tech's blood – now it's time to break the habit

So much for green Google ... Emissions up 48% since 2019

Microsoft CEO of AI: Your online content is 'freeware' fodder for training models

AMD buys developer Silo AI in bid to match Nvidia's product range

ChatGPT wrongly insists Trump-Biden CNN debate had 1 to 2-minute delay

OpenAI to pull plug on 'unsupported' nations – cough, China – from July 9

Microsoft exits OpenAI's boardroom to sidestep regulatory scrutiny

Cloudflare debuts one-click nuke of web-scraping AI

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

A friendly guide to local AI image gen with Stable Diffusion and Automatic1111

About Us

Our Websites

You Privacy