The participants in last week’s final of TV show America’s Got Talent included a Lebanese dance troupe, a pole dancer, the singer from Hootie & the Blowfish and… two tech entrepreneurs who “build Artificial Intelligence tools and software to create hyper-realistic synthetic media at scale”, according to their social media biographies.
Tom Graham and Chris Umé are the emerging godfathers of so-called deepfake technology, a mind-bending illusion by which a person in a video is digitally altered so they appear to be someone else. Thus the America’s Got Talent final saw four largely unknown singers line up on stage and, thanks to futuristic AI trickery embedded in the cameras in front of them, become deepfake versions of Elvis Presley and judges Simon Cowell, Heidi Klum and Sofía Vergara on the big screen behind them.
This all happened in real time, as if the extraordinarily lifelike quartet were singing Devil in Disguise live on stage. Cowell, Klum and Vergara (the real versions) looked on with mouths agape from their judges’ seats in the stalls. And you thought televised talent shows were all about line dancing dogs. Think again – reality TV has become virtual reality TV.
“What a nonsense that I might win AGT, right?” says Graham, a 38-year-old Harvard-trained lawyer, as he recalls his improbable brush with prime time stardom (Graham and Umé didn’t perform per se, but they appeared on stage as the brains behind the operation). “It was actually really fun because we are not in the business of being performers and so there was not that kind of pressure. Backstage we were less nervous and there was less riding on it, as it were. All the crew and the other performers are really nice.”
Metaphysic, the company Graham co-founded with Chris and his brother Kevin Umé, ended up coming fourth in the final (the winner was the Lebanese dance group, Mayyas). But they made a big splash. Their first audition happened back in June, and in it they persuaded the 2018 finalist Daniel Emmet to sing Chicago’s soft rock ballad You’re the Inspiration as a deepfake Cowell. The actual Cowell, meanwhile, watched through his fingers.
Not known as a man lacking in ego, the real Cowell asked afterwards whether it was “inappropriate to fall in love with a contestant”. Graham insists that Cowell didn’t know he was going to be deepfaked before that first audition: “He looked genuinely surprised.” And what did Cowell say to him backstage afterwards? “He was amazed at this stuff. He really liked the originality of it. And then we talked about cricket,” says Graham, an Australian.
Before the interview goes down a deepfake rabbit hole, I ask Graham to explain how it all works. It’s all fairly bonkers, but here’s the science. Metaphysic develops software that automates the process of recreating what someone looks like. It “trains” – Graham’s word – its software by showing it hundreds of thousands of images of the target person, such as Elvis. The software’s very sophisticated algorithms (known as deep neural networks) then learn to mimic and anticipate precisely how that person moves.
While we are used to seeing computer-generated imagery (CGI) and visual effects (VFX) in films, these involve some sort of manual human input as effects are painted in or touched up. There is no such human meddling here. Metaphysic’s algorithm “builds its own understanding [via] unsupervised learning about what Elvis’s face looks like in different circumstances,” Graham explains. “It builds a sort of nebulous kind of childlike brain understanding of what Elvis’s face would look like at any particular point.”
So sophisticated is the technology that the deepfake Elvis on screen can mimic whatever the actor in front of the camera is doing in real time, as millions saw on America’s Got Talent. This is because the camera updates the deepfake’s movements up to 30 times a second.
That’s how it works. But why does it work? Surely we, as human viewers, know what we’re viewing is fake? Not so, says Graham. This is because we have leapt over “the uncanny valley”, a phrase invented during the Japanese robotics boom of the Seventies. The uncanny valley theory posits that robots, dolls or CGI images that look fairly human freak us out because, well, they’re not quite human enough. But so good is the technology these days that viewers have moved beyond the doubtful stage. “As soon as the faces become indistinguishable from real people, you jump out of the uncanny valley and became very comfortable with them,” says Graham.
He says the possibilities in the entertainment world are endless. When it comes to music, Graham can see a time when famous performers will duet with their younger selves on stage. Some members of Metaphysic’s team were part of the Industrial Light & Magic team who worked on ABBA’s Voyage ‘Abba-tar’ show. While that technology is slightly different (the ABBA show used 3D models of bodies where Metaphysic uses AI algorithms), the potential for virtual concerts is the same. “Elvis could have a concert in Las Vegas. You could have it structured in a way that there’s a young version of Elvis, and an old version and a medium version,” says Graham.
There is in fact no need for actors to be on the stage, as they were on America’s Got Talent. The show could be a pre-recorded, screen-only show. In fact, “screen-only is probably appealing to people producing these things because the actual day-to-day production is less intensive if you’re putting it on 300 days a year in Vegas.” Mix and match is an option too: a live backing band with pre-recorded hyperreal vocalists on screen.
Deepfakes will enter the film world too. Could we get to the point, I ask Graham, when the star of a film isn’t actually in the film at all? When rather than them appearing, a deepfake version appears instead? “Absolutely,” he says. However, he adds that actors tend to be lovers of their craft so it’s more likely that they’d use deepfake technology as an adjunct to their performance. “You could imagine that actors could be playing younger versions of themselves in a film,” Graham says.
The technology would do away with the prosthetics and make-up used in the on-screen de-ageing process, and it could be applied to the whole body: a 35-year-old person moves differently to a 70-year-old, a difference that a fake nose and clever make-up can’t rectify. Martin Scorsese’s gangster epic The Irishman nailed the de-aged faces of the likes of Robert De Niro using technology but not always the characters’ body movements. This will change. But creating an Academy Award-winning performance using deepfakes remains “a long, long way off”, Graham says.
Some big announcements are imminent. Metaphysic is working on “all the things that we just spoke about” involving “absolutely the most well-known people on Earth”, Graham says. But he remains tight lipped about specifics. Expect news reasonably soon. This technology was only really developed around 18 months ago, so projects are still in the production stage. (It should be said that this so-called multi-layered metaverse technology is also tipped to transform life away from the entertainment sphere: it will allow virtual family gatherings and work meetings, Graham predicts.)
Talking about best known people on Earth, it was arguably the world’s most famous actor that kicked this deepfake whole thing off. Umé was behind the Tom Cruise deepfake that became a TikTok phenomenon last year. A series of videos by an account called DeepTomCruise showed the megastar doing a series of goofy things, from threatening to wrestle an alligator to dancing to George Michael’s Freedom in his underwear to preparing for a film premiere with Paris Hilton.
The account, whose subject looks, speaks and acts exactly like Tom Cruise, has attracted over 250 million views and 3.6 million followers. Except, of course, it’s not him. Graham explains that DeepTomCruise actually slightly pre-dated Metaphysic. It was an art project between Chris Umé and an actor called Miles Fisher, who ‘plays’ Cruise. Graham saw their first video (“I was fooled for a couple of seconds”) and “immediately gave Chris a call and we started Metaphysic a month later.”
Didn’t Cruise mind being deepfaked? He’s famously private and protective of his image. Apparently not. “Immediately we went and contacted Tom Cruise’s team through whatever channels we could, and ultimately they came back and said that they didn’t have an opinion either way,” Graham says. “We offered to give them the [opportunity] to stop, to sign over the rights, anything they wanted really, because we were flabbergasted by the response it had received. And I believe that their response was a fair indication that they were OK with it.” There remains a channel of communication open between Metaphysic and the Cruise team so the latter can ask them to stop should they so wish.
This hints at a far wider debate in the world of deepfakes – that of ethics. If it’s now possible to create a fake but lifelike version of anyone else using technology, aren’t there huge implications for fraud, identify and intellectual property theft and all kinds of exploitation, from financial to sexual? Graham says that the technology comes with responsibility and that Metaphysic has a “big focus” on ethics and – in particular – consent. The rule, he says, must be that individuals own and control their own hyperreal likeness, meaning they also own and control the data involved in training and creating that likeness.
In other words, people must have complete control over their digital DNA. Metaphysic has launched a platform called Every Anyone which enables individuals to own their own data when they interact with third party companies who operate in the metaverse. “Ultimately I believe that biometric data that represents us as individuals will be regulated [by governments],” Graham says. Which, for avoidance of doubt, would be a good thing.
The technology continues to develop at speed. Graham says the quality of the deepfake Cowell improved hugely between the first AGT audition in June and last week’s final. This is because the algorithm matures as it is trained. It keeps learning: the more of the real Cowell it is shown, the better the deepfake version becomes.
So where will this end? Could I, for example, pick a Rolling Stones concert from any era and watch it virtually as an audience member (assuming the Stones consented to the project)? “Yeah, absolutely. We could, for instance, work with the Rolling Stones and create three concerts from different eras. The next step is that you could be consuming it through a virtual reality headset.”
The immersive potential of this technology sounds genuinely mind-bending. But not as mind-bending as what Graham says next. “Then the next step is that we could put you on stage. So you become a back-up singer. Or a drummer. We’re creating a hyperreal space, so if we can create a hyperreal version of Keith Richards we can create one of you too.”
You mean I could duet on stage with Keith Richards? “Yes. Suddenly experiences can be more personalised in a profound way. You really could be on stage with the Rolling Stones.”
Cowell described Metaphysic as “probably the most incredible, original act” he’d had on America’s Got Talent. Original? Perhaps not, given that it mimics real people. But incredible? It’s hard to argue with that.