OpenAI’s Sora Shines a Spotlight on The Need for ‘Ethically Sourced’ AI | Commentary

Peter Csathy

18 April 2024 at 9:00 am·6-min read

OpenAI’s cinematic quality AI video generator Sora — and the power of what it represents — shook Hollywood just weeks ago. Its shocking quality certainly elevates the issue of what AI means for future Hollywood productions. But Sora also, once again, puts the spotlight on the fundamental issue of AI “training” on copyrighted works without consent.

Of course, when asked, OpenAI — like most generative AI companies — never comes right out and says that’s what it does. The company simply says that it trains Sora on “publicly available” works. While that sounds innocuous enough, it really isn’t. If it were, why would the company be so cagey about it? When directly asked whether Sora trained on YouTube videos, OpenAI’s CTO Mira Murati deflected. “I’m actually not sure about that,” she said.

We now know that it’s what we’ve suspected all along. “Publicly available” means simply that the food OpenAI uses for training its AI – because that’s what it is to OpenAI’s voracious AI pet – is content accessible online, much of which is copyrighted of course. Thanks to some intrepid journalistic digging by The New York Times, it’s clear now that OpenAI trained its ChatGPT Large Language Model (LLM) on over one million hours of YouTube videos, all without payment or consent.

And here’s a big tell about how OpenAI itself really feels about what it’s doing. The name for its internal speech-recognition program that takes YouTube videos and transcribes them into text for training purposes is Whisper, as in, let’s keep things on the down low. I’m no linguist, but it certainly seems like an admission of some sort to me. (OpenAI did not respond to a request for comment from TheWrap.)

Apparently, even YouTube — the copyright infringing OG — agrees. YouTube doesn’t precisely couch the issue in those terms, of course, perhaps because Google is reportedly training its own AI using YouTube videos. Earlier this month, CEO Neal Mohan bemoaned the fact that Sora’s non-consensual training on millions of its videos violates its terms of service. That’s a rich claim coming from YouTube, since YouTube built its initial base by enabling users to upload any videos they wanted – including copyrighted videos like SNL’s notorious “Lazy Sunday” that blew the lid off the platform– without securing licenses or compensating rights holders. One could say that the U.S. copyright laws and notices were the relevant “terms of service” at the time.

Given that inconvenient truth, some would say YouTube’s position sounds a bit, shall we say, hypocritical. But putting that aside for the moment, Mohan has a point. Why should OpenAI — or any other LLM — be able to feed off the works of others in order to build its value as a tool (or whatever you call generative AI)? And even more pointedly, where are the creators in this equation?

Apart from trying to dodge those specific questions, OpenAI, predictably, tries to turn the tables on Google and contend that what it does is effectively no different than what Google itself did when it vacuumed up the entire Internet world – millions of copyrighted works — for its “books” project in order to make them searchable online. At the time, the Ninth Circuit Court of Appeals blessed Google’s actions as being a “fair use,” in a seminal case that is always cited by those in tech who feel that creative works should be considered fodder for some kind of higher calling of infinite progress.

But Google showcased only snippets of those books in its search results – not the whole enchilada. There was no market substitution here. Once a user found the copyrighted work through search, they still would need to actually go out and buy the real thing. That’s a fundamentally different proposition than Sora’s. Sora doesn’t call attention to other copyrighted works and build new channels of monetization for them. Sora, instead, competes directly with them (at least it will when it becomes widely available).

Anyway, if OpenAI were so confident in the righteousness of its position, why be so cagey about it? Because it isn’t. Generative AI tech without content is essentially useless. We know that, and they know that. That means artists and creators of those creative works should be compensated. You can’t re-use my article here without my permission simply because it’s been posted. And that basic fact doesn’t change simply because you’ve sucked millions of works into your training vortex. It’s not just about the outputs generated by AI (that’s a separate copyright matter). It’s about the inputs as well.

At a minimum, it’s hard to argue that OpenAI’s opacity about what’s really going on should be confronted head on. All of us (creators and consumers alike), for a whole host of reasons, deserve to know precisely what OpenAI uses in its training data sets.

That kind of transparency is precisely what President Joe Biden’s Executive Order about AI calls for. Congress finally took Biden’s hint when just last week U.S. Rep. Adam Schiff introduced “The Generative AI Copyright Disclosure Act.” Following the European Union’s own historic legislation on the subject, Schiff’s act would require anyone that uses a data set for AI training to send the U.S. Copyright Office a notice that includes “a sufficiently detailed summary of any copyrighted works used.”

Essentially, this is a call for “ethically sourced” AI and transparency so that consumers can make their own choices. Think of it like nutritional labeling on food products for consumer safety reasons. “Trust and safety” logically should apply here too, and artists certainly agree. Two weeks ago leading musicians like Billie Eilish penned an open letter to the tech community to knock it off and stop training their LLMs without consent or compensation. So the heat is most definitely on, and it’s up to the creative community to keep the issue on the front burner.

So let’s first pull the curtain on what’s really going on in the AI sausage factory via demands for transparency. Then we can all directly confront the copyright legal issues head on with reality we all understand. To infringe, or not to infringe (because it’s fair use)? That is the question – and it’s a question winding through the federal courts right now that will ultimately find its way to the U.S. Supreme Court.

And when it does, my prediction is that ultimately even this wacky court will find a way to protect artists in the most basic of ways by following its surprising (to many) recent decision in the Andy Warhol Prince copyright case – in which it defined a new kind of direct harm to creator exception to fair use – it will rule in favor of creators. It will reject Big Tech’s efforts to train their LLMs on copyrighted content without consent or compensation, properly finding that AI’s raison d’etre in those circumstances is to build new systems to compete directly with creators – in other words, market substitution.

Simply because something is “publicly available” doesn’t mean that you can take it. It’s both morally and legally wrong. I’m an IP lawyer and welcome a healthy debate on that subject. But for god’s sake, be transparent about what you’re doing.

Reach out to Peter at peter@creativemedia.biz. For those of you interested in learning more, sign up to his “the brAIn” newsletter, visit his firm Creative Media at creativemedia.biz, and follow him on Threads @pcsathy.

The post OpenAI’s Sora Shines a Spotlight on The Need for ‘Ethically Sourced’ AI | Commentary appeared first on TheWrap.

Evening Standard
Elderly woman was 'rammed with trolley' sparking Manchester airport police 'stamping' incident
Brothers confronted man who had argued with their mother on flight before pushing trolley into her, it is claimed
Evening Standard
Man arrested over triple crossbow murders 'may never walk again' after self-inflicted wound
Kyle Clifford reportedly shot himself with a crossbow
Manchester Evening News
For decades 'Dr' Alemi fooled everyone. Now, her luck has run out
Zholia Alemi compared her plight to that of the Post Office scandal victims
The Telegraph
£500 per day? Britain’s biggest car park rip-offs
The joy one feels on arriving in a newly discovered town, beauty spot or tourist attraction and finding that parking is free of charge is as deep as it is rare.
Wales Online
Police make major discovery in Cardiff park full of families and children
They found it in the first week of the summer holidays
Manchester Evening News
Girl, 11, killed by an 'indefensible punch' her family then tried to cover up
Falaq Babar, who dreamed of becoming a lawyer, was punched in a 'fit of rage' by Suhail Mohammed, 23
CNN
Indebted Indian laborer finds life-changing $100,000 diamond
A debt-ridden laborer in central India said his family’s life has been “changed forever” after he unearthed a 19.22-carat diamond worth almost $100,000.
The Telegraph
Lee Anderson: I’d give medal to police officer filmed stamping on man’s head
Lee Anderson has said an armed officer who was filmed appearing to kick and stamp on a man’s head at Manchester Airport should be given a medal.
The Telegraph
Just Stop Oil protesters who threw soup on Van Gogh painting found guilty
Two Just Stop Oil activists have been found guilty of criminal damage after throwing soup over a Vincent Van Gogh painting.
The Independent
Heroic wife tried to save her soldier husband from knife attack
Witnesses say Eileen Teeton tried to come to aid of husband Lieutenant Colonel Mark Teeton during attack in Gillingham
The Independent
Manchester Airport video men have made assault complaint to police, lawyer says
A lawyer representing the men said they had to make their own way to hospital to be checked for head injuries
The Telegraph
Man stabbed his wife to death as she pushed pram, court hears
A man has pleaded not guilty to murdering his wife as she pushed their baby in a pram on the street.
PA Media: UK News
Around 1,000 police deployed amid Tommy Robinson protest and counter-march
Officers will try and keep opposing groups apart.
Bournemouth Echo UK
'No-one other than students wants to live in Bournemouth town centre'
Letter: 'Wake up BCP and smell the stench of weed!'
Evening Standard
Met Police issue CCTV footage after 'terrifying' sex attack in Hackney
Attacker only stopped when a delivery rider spotted him
Oxford Mail
Driver avoids disqualification after speeding on motorway
A Bicester driver has avoided disqualification after being caught speeding on the M1.
Manchester Evening News
'Please pray for him' - Man seen in Manchester Airport video has 'cyst on brain' and condition has 'worsened'
A solicitor representing the family involved in the incident at Manchester Airport has provided an update in a press conference
People
Pregnant Teen’s Body Found in Wooded Area. Her Ex-Boyfriend Is Now Charged With Her Murder
Jesus Monroy, 20, was charged with murder, felony murder, feticide and aggravated assault in relation to the homicide of his pregnant ex-girlfriend, Mia Campos, 16
CBC
Move-out date looms for hundreds of asylum seekers in Cornwall
Asylum seekers who've been living at a conference centre in Cornwall, Ont., say they're concerned about how quickly their move-out date is approaching.At the beginning of July, hundreds of people living at the Dev Hotel and Conference Centre were told they would need to move by July 31 due to the end of the federal government's contract with the centre, according to Immigration, Refugees and Citizenship Canada (IRCC).The length of time people have been there varies, from over a year for some to
Evening Standard
Met Police officer exposed as serial rapist to serve at least 18 years in prison
Cliff Mitchell boasted that he would never be caught

Latest stories