The New York Times Brings Receipts in Lawsuit Against OpenAI

Winston Cho

28 December 2023 at 7:25 am·5-min read

In April, The New York Times reached out to OpenAI and Microsoft to explore a deal that’d resolve concerns around the use of its articles to train automated chatbots. The media organization, after the highly publicized releases of ChatGPT and BingChat, put the companies on notice that their tech infringed on copyrighted works. The terms of a resolution involved a licensing agreement and the institution of guardrails around generative artificial intelligence tools, though the talks reached no such truce.

With an impasse in negotiations, the Times on Wednesday became the first major media company to sue over novel copyright issues raised by the tech in a lawsuit that could have far-reaching implications on the news publishing industry. Potentially at stake: The financial viability of media in a landscape in which readers can bypass direct sources in favor of search results generated by AI tools. The suit may push OpenAI into accepting a pricey licensing deal since it could create unfavorable case law barring it from using copyrighted material to train its chatbot.

More from The Hollywood Reporter

The Times‘ complaint builds upon arguments in other copyright suits against AI companies while avoiding some of their pitfalls. Notably, it steers clear of advancing the theory that OpenAI’s chatbot is itself an infringing work and points to verbatim excerpts of articles generated by the company’s tech — evidence that multiple courts overseeing similar cases have demanded.

The suit presents extensive evidence of products from OpenAI and Microsoft displaying near word-for-word excerpts of articles when prompted, allowing users to get around the paywall. These responses, the Times argues, go far beyond the snippets of texts typically shown with ordinary search results. One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit shows 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times, with word-for-word copying in red and differences in black.

In 2012, The Times published a series examining how
outsourcing by Apple and other technology companies transformed the global economy. — An exhibit from the *Times* complaint showing plagiarism from an OpenAI product.

According to two courts handling identical cases, plaintiffs will likely have to show proof of allegedly infringing works produced by the chatbots that are identical to the copyrighted material they were allegedly trained on. This potentially presents a major issue for artists suing StabilityAI since they conceded that “none of the Stable Diffusion output images provided in response to a particular Text Prompt is likely to be a close match for any specific image in the training data.” U.S. District Judge William Orrick wrote in a ruling dismissing claims against AI generators in October that he’s “not convinced” that copyright claims “can survive absent ‘substantial similarity’ type allegations.” Following in his footsteps a month later, U.S. District Judge Vince Chhabria questioned whether Meta could be held liable for infringement in a suit from authors in the absence of evidence that any of the outputs “could be understood as recasting, transforming, or adapting the plaintiffs’ books.”

“You have to show an example of an output that is substantially similar to your work to have a case that’s likely to survive dismissal,” says Jason Bloom, chair of Haynes Boone’s intellectual property practice. “That’s been really tough to prove in other cases.”

The outputs serve the dual function of providing compelling evidence that articles from the Times were used to train AI systems. Since training datasets are largely black boxes, plaintiffs in most other cases have been unable to definitely say that their works were included. Authors suing OpenAI, for example, can only point to ChatGPT generating summaries and in-depth analyses of the themes in their novels as proof that the company used their books.

The Times‘ approach in its suit stands in contrast to the complaint from The Authors Guild, which opted to mostly limit its case to issues around the ingestion of copyrighted material to train AI systems. “Copyright law has always insisted on substantial similarity,” says Mary Rasenberger, executive director of the organization. “And when you have exact reproduction, that is by definition substantially similar.”

The Times stresses that it’s the biggest source of proprietary data that was used to train GPT (and the third overall behind only Wikipedia and a database of U.S. patent documents). Amid an ocean of junk content commonly found online, articles from reputable publishers are taking on renewed significance as training data because they’re more likely to be well-written and accurate than other content typically found online. In this backdrop, the suit may be the first of several to come as news archives become increasingly valuable to tech companies. Axel Springer, the owner of Politico and Business Insider, this month reached a deal with OpenAI for its content to train GPT products, opting to take money from the AI giant instead of initiating its own legal challenge.

The domain www.nytimes.com is the most highly represented proprietary source represented in a filtered English-language subset of a 2019 snapshot of Common Craw, an AI training dataset. — In its complaint, the *Times* says that it’s the biggest source of proprietary data that was used to train GPT (and the third overall behind only Wikipedia and a database of U.S. patent documents).

Still, the Times may face an uphill battle when compared to some of the other suits led by writers of fiction content. The suit filed by the Authors Guild likely seeks to solely represent a class of fiction writers since facts aren’t copyrightable, which makes it more difficult to allege infringement over news articles or nonfiction novels. Providing evidence of near verbatim copying will be vital for the Times to show that OpenAI’s products aren’t merely providing facts but are copying the composition in which they’re presented.

The complaint brings claims for copyright infringement, contributory copyright infringement, trademark dilution, unfair competition and a violation of the Digital Millennium Copyright Act. A wrinkle in the suit separating it from others against AI companies involves allegations that it falsely attributes “hallucinations” to The Times.

“In response to a prompt requesting an informative essay about major newspapers’ reporting that orange juice is linked to non-Hodgkin’s lymphoma, a GPT model completely fabricated that The New York Times published an article on January 10, 2020, titled ‘Study Finds Possible Link Between Orange Juice and Non-Hodgkin’s Lymphoma,”’ the complaint states. “The Times never published such an article.”

A finding of infringement could result in massive damages since the statutory maximum for each willful violation runs $150,000.

Best of The Hollywood Reporter

OK! Magazine
Jay Slater searcher makes unpleasant admission on This Morning after flying to find missing teen
TikTok star Paul Arnott is involved in the hunt for missing Jay Slater in Tenerife – but when interviewed by This Morning, he revealed his thoughts on Jay's chances of survival
Hello!
Victoria Beckham just wore the slinkiest see-through dress - we're totally speechless
Victoria Beckham looked incredible on Instagram wearing a gold, see-through dress to advertise her perfume, Portofino 97 - from her Victoria Beckham Beauty brand.
Manchester Evening News
Roy Keane's hilarious reaction to England goal as Ian Wright and Gary Neville celebrate
Jude Bellingham scored a stunning overhead kick in the fifth minute of stoppage time in England vs Slovakia - and Man United icon Roy Keane showed a trademark reaction to the goal
Manchester Evening News
Urgent recall on blood pressure medication over cardiac arrest fears
Over 100 batches of faulty medication are being recalled as it could result in hyperkalemia, irregular heartbeats and even cardiac arrest
Manchester Evening News
Roy Keane's crude four-word ITV remark leaves Man United icon Gary Neville stunned
Roy Keane has been a prominent figure in the ITV studio throughout Euro 2024 and he was at it again with a hilarious quip in the build-up to England's last 16 clash
The Argus
Drivers warned 'ultra' speed cameras can fine you even if you aren't speeding
The new speed cameras can "spy inside cars" and could leave you with a fine of up to £1,000 even if you weren't breaking the speed limit
HuffPost
Trump Throws Middle-Of-The-Night Fit After Nancy Pelosi Called Him Out On Live TV
The former president attacked the former House speaker as a “sick puppy.”
Bournemouth Echo UK
UK to be hit by second heatwave in a matter of days from Mediterranean airstream
The UK is set for a second heatwave this summer as weather forecasters suggest Mediterranean airstream will bring hot weather.
Manchester Evening News
Glastonbury 2024 fans say it's 'criminal' and festival 'messed up' as Avril Lavigne performs
The Canadian musician dazzled crowds at Worthy Farm
Evening Standard
England have already been dealt huge selection blow for Switzerland quarter-final clash at Euro 2024
Gareth Southgate will be forced into Three Lions reshuffle for last-eight showdown in Dusseldorf
OK! Magazine
Princess Diana's niece Lady Kitty Spencer finally reveals baby daughter's name
Lady Kitty Spencer has finally revealed the unique name she's chosen for her baby daughter in a sweet new post on Instagram, as royal fans gush over the 'beautiful' moniker
OK! Magazine
Amanda Holden proudly poses with lookalike daughter Lexi, 18, as she finishes A-levels
Amanda Holden's lookalike daughter Lexi, 18, has finished her A-levels, with the Britain's Got Talent judge proudly gushing over her 'brilliant, funny, resilient and all grown up' child
The Telegraph
Watch: England fans humiliated after leaving Euro 2024 last-16 match before Jude Bellingham goal
A small group of England fans missed Jude Bellingham’s stoppage-time stunner and Harry Kane’s extra-time winner after leaving their seats early.
The Telegraph
Heavy-handed and inconsistent referee nearly cost England their Euros campaign
Referee Halil Umut Meler was so poor and it could easily have cost England victory. The aim of Uefa referees chief Roberto Rossetti is to deliver a tournament where match officials deliver consistent and accurate decision-making in each game. We have witnessed some very good performances to date with, of course, the odd VAR controversy.
OK! Magazine
Prince William's emotional five-word statement that proves he's a true England fan - after last-gasp victory
The Prince of Wales posted a rallying message to the England team on Twitter - signing it with a W...
HuffPost
Supreme Court: Trump Has 'Absolute Immunity' For Core Constitutional Powers
The court found that Trump is entitled to immunity from prosecution for "official" acts, but not "unofficial" ones, and ordered lower courts to decide which acts are which.
HuffPost UK
Rishi Sunak Left Squirming As He's Confronted With Ordinary People's Lives In Tory Britain
The prime minister had insisted life was "better" now than when the party came to power in 2010.
HuffPost
Lara Trump’s Latest Donald Trump Claim Receives Easiest Fact Check Of All Time
Donald Trump's daughter-in-law confused the heck out of critics with her boast.
The Telegraph
Putin’s navy is now a spent force
In 2024, the Russian navy plans to take delivery of 12 new warships plus another 38 small craft, according to state media.
Hello!
Lady Louise Windsor pictured with special university friend during horse trials - see photos
Lady Louise Windsor has been pictured with Felix da Silva-Clamp for the first time as the daughter of the Duke and Duchess of Edinburgh took her beau carriage driving

Latest stories