Clean

The Origin and Impact of Deepfake Technology

Published Jun 8, 2023, 3:30 PM

Deepfakes create a dangerous situation -- how can we trust that a recording is the real McCoy when machines can make such convincing fakes? From the history of deepfake technology to the liar's dividend, we learn about the evolution of the tech and the problems it creates. 

Welcome to tech Stuff, a production from iHeartRadio. He there and welcome to tech Stuff. I'm your host, Jonathan Strickland. I'm an executive producer with iHeartRadio. And how the tech are you. In April twenty twenty three, the lawyers for Tesla CEO Elon Musk argued that submitted recordings of their client from twenty sixteen might have been deep bakes, so the ongoing case is an emotionally charged one. In twenty eighteen, a man named Walter Huang died in a car accident. The Tesla he was in was a Tesla Model X and it was engaged in autopilot mode at the time of the crash. His family contends that the Tesla's safety systems failed and the vehicle steered itself into a concrete median, and the family's lawyer submitted a recording of Elon Musk as evidence that Wang was led to believe his vehicle had greater capabilities than it actually possessed. So, in this recording, Elon Musk said of Tesla vehicles, quote a Model S and Model X at this point can drive autonomously with greater safety than a person right now. End quote In response, Musk's lawyers said, the recording could be faked. Now, not to waffle about this, but if we're speaking solely on a technical level, the recording could be faked. And by that I mean there are technologies that are sophisticated enough to create a fake recording. But just because something could be faked doesn't mean it actually was faked. And the judge in the Tesla Casevet Pennypacker, who has an amazing name, said that this argument is a truly dangerous one. The judge said that it implies quote that because mister Musk is famous and might be more of a target for deep fakes, his public statements are immune quote. In other words, if you're notable enough or notorious enough, you have a carte blanche excuse for anything that you are recorded as saying, because maybe someone just targeted you and created a fake version to discredit you. In twenty eighteen, Danielle Citron and Robert Chesney wrote a paper in which they predicted this sort of situation. They dubbed it the liar's dividend. That when there is a proliferation of technology that can create misinformation or outright disinformation, the liars out there reap the benefits, because what is the truth anyway? When you can't trust the evidence, everything falls apart. This is just one of the many challenges deep fake technology presents. There are potentially harmless or perhaps even beneficial uses of this technology, but it doesn't take much imagination to come up with ways to cause harm. Let's talk a second about the entertainment industry. With deep fake technology, it becomes possible to create videos and audio recordings that simulate celebrities, which potentially allows a director to cast a film with people who otherwise would be very much unavailable. Using sufficiently sophisticated deep fakes, you could create a movie that combines a cast of modern and classic film stars. Maybe you want the Marx Brothers running around with Will Ferrell, Maybe you want Lon Cheney Junior to show up in your modern werewolf movie. Or maybe you're doing something slightly less extreme, maybe you're using the technology to generate a younger version of your current star ala Harrison Ford in the upcoming Indiana Jones and the Dial of Destiny film. So it doesn't have to be more or sinister, but it does bring into question concepts like the right to personality or the right to identity or the right to publicity. Presumably filmmakers wouldn't want to move forward on any project with a computer generated simulation of a real film star without the permission from that person or their family. But it's possible to do it, and depending on the movie, maybe they do go ahead without securing permission first. Maybe it's a edgy parody film, and the buzz around their decision to do this could end up being a boost to marketing. People would say how dare they do this, and then go buy tickets to see the fallout of it. For actors, there's a real concern that this technology could rob them of work, that if they turned down a role, the filmmaker could just get a computer generated version of them in there, or that they could appear to you know, appear in projects that they don't actually agree with, and perhaps most importantly for many actors out there, that this could all happen without compensation for the original actor. I know that it could be tough to feel sympathetic toward big Hollywood stars, but keep in mind the vast majority of working actors out there are not raking in the huge movie deals. They're just as worried about AI biting into their work as the rest of us are. Then there's the world of audio performance. Earlier this year, a TikTok user with the handle ghost Writer nine seven seven wrote and produced a song called Heart on My Sleeve. But ghost Writer nine seven seven didn't provide the vocals for this track. Instead, they used AI generated deep fake vocal simulations of artists Drake and The Weekend. The songwriter then posted the release on multiple platforms and it quickly went viral. Universal music groups sprung into action right away and claimed copyright infringement. And I am no legal expert, but in my mind that's a weak argument. After all, the song itself was an original, it was not a cover. It had not been stolen from someone's discography. You cannot copyright the sound of a voice. Universal Music Group doesn't own the vocal quality of Drake or the Weekend, and I'm sure those artists would be concerned to learn otherwise. And even if the agreement between the label and the artists did go all Ursula from the Little Mermaid and claim ownership of the voices themselves, there's not really a legal foundation to use that as a deterrent against Deep fakes. Universal Music Group did argue that the deep fake voices used tons of recorded material to train itself to sound like those artists. That is most certainly the case. We'll dive into deep fake techniques a bit later in this episode, but it often boils down to machine learning and using a lot of training material to educate a model about what it is you want it to do. The more material you can submit in training, the better, and Universal Music Group said quote the training of generative AI using our artist's music, which represents both a breach of our agreements and a violation of copyright law end quote, before going on to suggest that allowing Heart on my sleeve to exist as akin to powering up skynet so that the terminators will become real. I'm exaggerating only a little bit, and again I am not a copyright expert, but it's hard for me to imagine how training an AI model on music is in itself a violation of copyright law. After all, every musician, every artist, heck, every person who has been around other people has been influenced by the work of other people. Sometimes you can actually hear the influences in music. You might hear an artist play and say, oh, that reminds me of Johnny Cash or something like that. The history of art is one in which succeeding generations iterate on the works of those who came before them. Sometimes they make drastic departures from the generations that came before them, but even that is in response to the influence of the earlier art. So, if you make the argument that training AI on specific works is wrong, how do you differentiate that from someone who gets their start playing song covers or maybe writing their own stuff, but with musical influences from identifiable artists. Because art is not created in a vacuum, obviously, using AI is different. It can lead to the creation of a near perfect simulation of the original artist, But the method of training the AI isn't really that different from a budding musician voraciously devouring the entire discography of their favorite artists before emulating those artists in their own work. It is a sticky wicket, no question about it, and we're in the early stages of figuring out how to handle it, which is particularly unfortunate since the technology is already here. But how did we get here? Well. An exhaustive history of deep fake technology would require a full series of episodes about the history of artificial intelligence and machine learning in general and computer vision in particular, as well as text to speech and lots of other related technologies. But for our purposes, we'll simply acknowledge that countless computer scientists and programmers had spent endless hours advancing computer technology with the goal of finding ways to make machines quote unquote under and data. This is easier said than done, so let's take images as an example, as that will factor heavily in our discussion today. We humans can glance at a photo and we can immediately identify what is an object versus just a background. So if you have a red mug placed in front of a white cinder block wall, we can see what is a mug and what is a wall. But we have to teach computers how to do that, and when you're talking about technologies that generate moving images, it becomes even more complicated. So, for lack of a clear beginning, I am somewhat arbitrarily going to start in nineteen ninety seven. Now, a couple of things happened that year that would be important for us to talk about, and one was not quite deep baked technology, but it did illustrate some potential ethical issues we had to think about. And that was a commercial that aired during a big old American football game. You know one that happens every year, You know, the one I can't I can't call it by name for you know, legal reasons. Anyway, one famous feature of this big old American football game is that brands will shell out huge amounts of money to air commercials during it. And one brand to do that in nineteen ninety seven was the Dirt Devil vacuum cleaner company. Now, those of you across the pond would call it a hoover, not a vacuum cleaner, but a hoover is a different brand altogether, so stop confusing me. In the commercial, famous actor and dancer Fred Astaire is shown dancing with Dirt Devil vacuum cleaners. But here's the thing. Fred Astaire had died a decade earlier. The footage was taken from his films, with Dirt Devil inserting the imagery of its products into the footage to make it seem as if Astaire had actually shot commercials this way and really danced with vacuum cleaners. So in this case, the footage of a stare was legitimate. It was the appearance of the vacuum cleaners that had been inserted into it. But the use of footage of performers who have passed away prompted a debate about the ethics of that practice, and people began to speculate about what might happen once technology reached a point where a computer simulation of a person would be indistinguishable from the real thing. Meanwhile, also in nineteen ninety seven, a group of computer scientists published and important work. The scientists were Christoph Or, Chris Bregler, Michel Covell, and Malcolm Stanley. The paper's title is Video Rewrite Driving Visual Speech with Audio. This work built on top of a lot of other previous work. For example, base interpretation was already a discipline in computer science. It traces its history all the way back to the nineteen sixties. Ditto for technology that could generate speech from texts. That two dates back to the nineteen sixties. Computer animation had been around for a while by nineteen ninety seven, so creating a three D model of lips, one that you could subsequently animate that was also a thing already. But what these researchers did was they brought all these elements together. It was a convergence of technologies that resulted in a new application, one which would allow for computer generated synthetic video of real people. The team created the video rewrite software, and they also showed what it was capable of doing in some very very short video clips. The results are primitive by today's standards, but nonetheless impressive. In one two second clip, President JFK appears to say I never met Forrest Gump. It's a cheeky reference to the nineteen ninety four film, which included a segment in which the titular character Forst Gump appears to meet JFK and then informs him that he needs to rush off to the restroom. Video Rewrite served as a foundation for technologies that we could refer to as deep fake tech. So just a few years later, in two thousand and one, Christopher J. Taylor, Gareth J. Edwards, and Timothy. My middle initial is F and not Jay, which actually upsets Jonathan. Because of a lack of consistency, Coots published a paper that was titled Active Appearance Models. The abstract for this paper reads, in part quote, we describe a new method of matching statistical models of appearances to images. End quote now in plain English. This paper describes a method in which computer vision relies on statistical models to more accurately identify elements within the image. So let's consider facial recognition technology. As I mentioned earlier, computers do not inherently understand image. If presented with a picture of a face, a computer cannot naturally determine what the various features of that face are. Only through proper programming and machine learning can you start to do this and train a computer to recognize features like a nose, a mouth, eyebrows, eyes, et cetera. And by training machines on millions of faces, you can reach a point where the machine can examine a new face, one that has never before been submitted to the machine, and attempt to identify those features. This is a necessary step with a lot of deep fake technology. See to call all deep fakes computer generated is a little misleading. Often what is happening is a computer is replacing an existing person or face in a video with someone else's features. In order to do that, you first have to be able to map and identify the original person that was in the video, you need to be able to match the synthesized face with the movements of the original face. To do that, the computer first has to encode the original face, essentially to break it down into lots of smaller shapes. Then it has to be able to match the synthesized face to the original one with a similar encoded approach, and then decode that into the synthesized face that replaces the original one and then follows the various motions of the original face. So you're replacing one person with another through the use of a computer, and as part of that, the computer has to break down the original person into points of data that the computer can handle. So with this technology, I could stand facing a camera and deliver a speech and then, using software designed to follow the steps I just laid out, replace my image with that of someone else. If I also used a program designed to create a vocal impersonation of that someone else, well I could create a video where some celebrity says things that they would never say, Like maybe I could create a video of Keanu Reeves saying tech Stuff is my favorite podcast. Jonathan is such a cool host. I wish I could hang out with him. For the record, mister Reeves, I would never actually do that. I'm just saying I could do it. Of course, creating a video image of Keanu Reeves would just be one part of the equation. Another would be replicating his voice. Now, I could try and do my own impersonation, but this would so clearly be fake that I would never achieve my goal of trying to make it appear as though Keanu Reeves knows who I am and wants to hang out with me. I can't even say WHOA the way he does. To achieve my dreams, I would need a voice synthesis program that I could train on Keano's voice and then produce a computer generated impersonation. The history of voice synthesis is crazy long. I mean, if we really, we really wanted to dive into it, we could go all the way back to the late seventeen hundreds. But we won't because I can't keep you here that long. Text to speech technologies brings us a bit closer to modern day, but then we're still talking about the nineteen sixties or thereabouts. As I mentioned earlier in this episode. To get to a point where computers are capable of producing an imitation of a specific person's voice. Then we're getting up to like the last decade or so, researchers built tools that, after training on how a specific person produces different sounds phonemes. If we want to think of it in terms of language and the sounds of language, well, then we have applications that can take text, interpret that text as a series of sounds, pull upon the computer knowledge of how a particular person makes those specific sounds, and then voila, we have ourselves a copy. Now, early versions of this technology were understandibly a bit limited. You would end up with speech that on a service level sounded like the person in question, the synthesized person, but it would typically come across as flat or using incorrect inflection to emphasize a point. So think of that kind of robotics sound you would get with early personal assistance, right, like if you were using a GPS system, which I realized I just used a repetition there, like ATM machine. But let's say you're using a GPS and it has a voice associated with it. Older ones were very robotic, and they could also say things that were hilariously wrong. I'll never forget the time I was riding in a car and the GPS told us to turn right on Oak Doctor instead of Oak Drive. But over time the models improved and things started to sound a bit more natural. So those early ones not so good. Not mistake them for being a real person. It would sound like a robot in making an impersonation of that person, But models would grow in sophistication, and training sessions would include examples where the target's expressions would be associated with specific emotions like anger, or happiness or sadness. You can actually use a voice synthesizer yourself and train it. And as part of that, you're typically told to read out sentences with different emotional weight to them, So using a bit of appropriate text, then maybe some metadata to indicate what emotion should be used to read out that text. It then becomes possible to craft vocal performances that were and are difficult to distinguish from the real thing. We're going to take a quick break to thank our sponsor, and then I'll be back to talk more about the history and impact of deep fake technology. Back to our history of video deep fakes. We left off at two thousand and one, and for nearly two decades computer scientists continued to work on systems that would push forward the capabilities of synthesized video content. By the time we get up to twenty seventeen, a pair of papers explained that the advancements in consumer computers had reached a point where it was actually possible to achieve synthesized video using off the shelf computer systems, and that would be a huge game changer. No longer would you need access to incredibly powerful systems with specialized software. Now you could potentially create or access an application on an off the shelf computer to do the same thing. So the tools to generate computer synthesized video now we're within the grasp of the average computer user. With cloud based services that could augment these efforts, it became possible for a creative person to make videos that appear to show people doing and saying things that they never actually did. And again, there are multiple uses for such technology. Not all of them are sinister, but it doesn't take much imagination to come up with scenarios where things get grim, And indeed, many early uses of this tech once it became accessible, were bad. One big one was using face swapping technology to make it appear as though someone famous or otherwise was appearing in an adult video. And I think it goes without saying that this is a total violation of the victim. It robs them of agency and they may end up suffering consequences despite not being remotely responsible for the content. So imagine facing judgment for something that not only you did not do, but you had no way of preventing. Honestly, it's impossible for me to communicate how devastating this can be. There are several accounts online written by people who have been the victim of this sort of activity, and they are worth your time. They are harrowing to read, but it is important their words will far more effectively explain how traumatizing this experience can be. And just as a reminder, the rise of social networks means that we've all been sharing a lot of images of ourselves, videos of ourselves. There's a lot of content out there that could be used to train various machine models. So it's something to keep in mind that even if you aren't concerned right now, there's nothing to say that you couldn't become a victim tomorrow. Deep fakes also pose a risk to organizations it's not just individuals. So imagine for a moment that you see you have a voicemail at work, and you pull it up, and you listen to the voicemail, and it sounds like your boss, and your boss is telling you that you need to transfer company funds from the company account into a different one. And perhaps they say that it's in order for you to pay off some third party vendor for a project that you're not really familiar with. But then maybe it turns out that voicemail wasn't from your boss after all. Maybe it was the result of spearfishing. Maybe a nefarious thief has identified you as a possible key to stealing money from your organization and has used tech to impersonate your boss and direct you toward facilitating a crime. You unknowingly have become an accomplice. There's actually been a case where this sort of thing was alleged to have happened. Now I have to say alleged, because there were questions about whether or not it really was a case of a synthesized voice, or if maybe this was more of a straightforward embezzlement issue, and the deep fake defense aka the liar's dividend came into play. Deep fakes have come a long way in a few short years. However, they are not perfect. There can be telltale signs that a video is fake, though they can sometimes be too subtle for the human eye to detect. Sometimes there's a dead giveaway. You're watching a video and you think this person is blinking too frequently or not frequently enough, or maybe their eyes don't look quite right, or they movements you're seeing don't line up. That a person is turning their head one way, their eyes are shifting another in a way that just doesn't seem natural. There are those sorts of things that people can pick up on, there's some that are far more subtle, and deep fake detection tools are growing in importance as a result of this. There are tools that are trained to spot signs of fakery, sometimes ones that are far too subtle for us to notice. So it may be things like inconsistencies in lighting and the quality of reflections within the frame. Things like that may end up being an indication that a video was manufactured artificially rather than an actual recording, and they're becoming more and more important for people and for organizations. So in addition to those tools, Organization leaders should really prepare employees for the possibility of encountering deep fakes. Critical thinking is a big part of uncovering deception, as is preparation. Heck, depending on the organization, you might go so far as to set up a phrase or question as an authentication process at the top of an official phone call or video meeting, so that the person on the other end of the line can verify that things are legit. I know it sounds like you're going a far away, but as this technology gets more sophisticated, as people deploy it in ways that are potentially harmful, you have to start to think about these things. What we do not want to do is to enter into an era where we can no longer reliably determine the real from the fate. But there is no putting the cat back in the bag, or the genie in the bottle or baby in the corner. The technology isn't going away. It will not disappear. It will continue to evolve and to improve, and so it falls upon us to educate ourselves as best we can in preparation for encountering it, and to think about how we can address the flagrant misuses of the technology to attempt to dissuade people from using it in that way, because again, the victimization element of this can be really severe and really traumatizing and incredibly disruptive to a person's life. We should not forget that either. So in conclusion, I will say that this technology is truly impressive and again it can have some really incredible uses. I don't want to paint it as just being a bad thing. It is not good or bad. It is how we use it that determines whether or not the end result is a positive one or a negative one. But only by learning about it can we prepare for what is to come. So I hope that you found this episode informative, that you have a deeper appreciation for what this technology does and what it is capable of, and I will speak to you again really soon. Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts, or wherever you listen to your favorite shows.

In 1 playlist(s)

  1. TechStuff

    2,433 clip(s)

TechStuff

TechStuff is getting a system update. Everything you love about TechStuff now twice the bandwidth wi 
Social links
Follow podcast
Recent clips
Browse 2,430 clip(s)