Home Data Exploring Deepfakes

Exploring Deepfakes

By Bryan Lyon , Matt Tora
books-svg-icon Book
eBook $31.99 $21.99
Print $39.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $31.99 $21.99
Print $39.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Chapter 1: Surveying Deepfakes
About this book
Applying Deepfakes will allow you to tackle a wide range of scenarios creatively. Learning from experienced authors will help you to intuitively understand what is going on inside the model. You’ll learn what deepfakes are and what makes them different from other machine learning techniques, and understand the entire process from beginning to end, from finding faces to preparing them, training the model, and performing the final swap. We’ll discuss various uses for face replacement before we begin building our own pipeline. Spending some extra time thinking about how you collect your input data can make a huge difference to the quality of the final video. We look at the importance of this data and guide you with simple concepts to understand what your data needs to really be successful. No discussion of deepfakes can avoid discussing the controversial, unethical uses for which the technology initially became known. We’ll go over some potential issues, and talk about the value that deepfakes can bring to a variety of educational and artistic use cases, from video game avatars to filmmaking. By the end of the book, you’ll understand what deepfakes are, how they work at a fundamental level, and how to apply those techniques to your own needs.
Publication date:
March 2023
Publisher
Packt
Pages
192
ISBN
9781801810692

 

Surveying Deepfakes

Understanding deepfakes begins with understanding where they came from and what they can do. In this chapter, we’ll begin to explore deepfakes and their operation. We will go through the basics of what makes a deepfake work, talking about the differences between a generative auto-encoder and a generative adversarial network (GAN). We will examine their usTo PD: es in media, education, and advertising. We’ll investigate their limitations and consider how to plan and design your deepfakes to avoid the common pitfalls. Finally, we’ll examine existing deepfake software and discuss what each kind can do.

We’ll cover this in the following sections:

  • Introducing deepfakes
  • Exploring the uses of deepfakes
  • Discovering how deepfakes work
  • Assessing the limitations of generative AI
  • Looking at existing deepfake software
 

Introducing deepfakes

The name deepfake comes from a portmanteau of “deep”, referring to deep learning, and “fake,” referring to the fact that the images generated are not genuine. The term first came into use on the popular website Reddit, where the original author released several deepfakes of adult actresses with other women’s faces artificially applied to them.

Note

The ethics of deepfakes are controversial, and we will cover this in more depth in Chapter 2, Examining Deepfake Ethics and Dangers.

This unethical beginning is still what the technology is most known for, but it’s not all that it can be used for. Since that time, deepfakes have moved into movies, memes, and more. Tom Cruise signed up for Instagram only after “Deep Tom Cruise” beat him to it. Steve Buscemi has remarked to Stephen Colbert that he “never looked better” when his face was placed on top of Jennifer Lawrence’s and a younger version of Bill Nighy was deepfaked onto his own older self for a news clip from the “past” in the movie Detective Pikachu.

In this book, we will be taking a fairly narrow view of what deepfaking is, so let’s define it now. A deepfake is the use of a neural network trained on two faces to replace one face with another. There are other technologies to swap faces that aren’t deepfakes, and there are generative AIs that do other things besides swapping faces but to include all of those in the term just muddies the water and confuses the issue.

 

Exploring the uses of deepfakes

The original use of Deepfakes might be the one that required the least amount of imagination. Putting one person’s face on another’s person has many different uses in various fields. Please don’t consider the ideas here as the full extent of the capabilities of deepfakes – someone is bound to imagine something new!

Entertainment

Entertainment is the first area that comes to mind for most people when they consider the usage of deepfakes. There are two main areas of entertainment in which I see deepfakes playing a significant role: narrative and parody.

Narrative

The utility of deepfakes in movies is obvious. Imagine an actor’s face being superimposed onto their stunt double or an actor who becomes unavailable being replaced by another performer without any changes to the faces in the final movie.

While deepfakes may not seem good enough, deepfakes are already being used in Hollywood and other media today – from Detective Pikachu, which used deepfakes to de-age Bill Nighy, to For All Mankind, which used it to put actors face to face with Ronald Reagan. Agencies and VFX shops are all examining how to use deepfakes in their work.

These techniques are not unique to deepfakes. CGI (in this book, referring to 3D graphics) face replacements have been used in many movies. However, using CGI face replacement is expensive and complicated, requiring filming to be done in particular ways with lots of extra data captured to be used by the artists to get the CGI face to look good in the final scene. This is an art more than a science and requires extensive skills and knowledge to accomplish. Deepfakes solve many of these problems making new forms of face replacements possible.

Making a deepfake requires no special filming techniques (although some awareness will make the process smoother). Deepfakes also require very little attention or skill compared to CGI face replacements. This makes it ideal for lower-cost face replacements, but it can also be higher-quality since the AI accounts for details that even the most dedicated artist can’t recreate.

Parody

Parody is an extremely popular form of social criticism and forms the basis for entire To PD: movies, TV shows, and other forms of media. Parody is normally done by professional impersonators. In some cases, those impersonators look (or can be made to look) similar to the person they’re impersonating. Other times, there is a reliance on their performance to make the impersonation clear.

Deepfakes provide an opportunity to change the art of parody wherein the impersonator can be made to look like the individual being parodied via a deepfake instead of by chance of birth. By removing the attention from basic appearance, deepfakes allow the focus to be placed directly on the performance itself.

Deepfakes also enable a whole new form of parody in which normal situations can become parodic simply due to the changed face. This particular form becomes humorous due to the distinct oddity of very different faces, instead of an expected swap.

Figure 1.1 – Steve Buscemi as Jennifer Lawrence by birbfakes

Figure 1.1 – Steve Buscemi as Jennifer Lawrence by birbfakes

Note

This image is included with the kind permission of its original creator, birbfakes. You can view the original video here: https://youtu.be/r1jng79a5xc.

Video games

Video games present an interesting opportunity when it comes to deepfakes. The idea here is that a computer-generated character could be deepfaked into a photorealistic avatar. This could be done for any character in the game, even the player’s character. For example, it would be possible to make a game in which, when the player’s character looked into a mirror, they would see their own face looking back at them. Another possibility would be to replace a non-player character with a deepfake of the original actor, allowing for a far more realistic appearance without making a complete 3D clone of the actor.

Education

Education could also benefit from deepfakes. Imagine if your history class had a video of Abraham Lincoln himself reading the Gettysburg address. Or a corporate training video in which the entire video is hosted by the public mascot (who may not even be a real person) without having to resort to costumes or CGI. It could even be used to allow multiple videos or scenes filmed at significantly different times to appear to be more cohesive by appearing to show the actor at the same time.

Many people are very visual learners and seeing a person “come alive” can really bring the experience home. Bringing the pre-video past to life using deepfakes enables a whole new learning experience. One example of this is the Dalí Museum, which created a series of videos of Salvador Dalí talking to guests. This was done by training a deepfake model on an actor to put Dalí’s face on the videos. Once the model was trained and set up, they were able to convert many videos, saving a lot of time and effort compared to a CGI solution.

Advertisements

Advertising agencies are always looking for the newest way to grab attention and deepfakes could be a whole new way to catch viewers’ attention. Imagine if you walked past a clothing store, you stopped to look at an item of clothing in the window, and suddenly the screen beside the item showed a video of an actor wearing the item but with your face, allowing you to see how the item would look on you. Alternatively, a mascot figure could be brought to life in a commercial. Deepfakes offer a whole new tool for creative use, which can grab attention and provide whole new experiences in advertising.

Now that we’ve got some idea of a few potential uses for deepfakes, let’s take a quick look under the hood and see how they work.

 

Discovering how deepfakes work

Deepfakes are a unique variation of a generative auto-encoder being used to generate the face swap. This requires a special structure, which we will explain in this section.

Generative auto-encoders

The particular type of neural network that regular deepfakes use is called a generative auto-encoder. Unlike a Generative Adversarial Network (GAN), an auto-encoder does not use a discriminator or any “adversarial” techniques.

All auto-encoders work by training a collection of neural network models to solve a problem. In the case of generative auto-encoders, the AI is used to generate a new image with new details that weren’t in the original image. However, with a normal auto-encoder, the problem is usually something such as classification (deciding what an image is), object identification (finding something inside an image), or segmentation (identifying different parts of an image). To do this, there are two types of models used in the autoencoder – the encoder and decoder. Let’s see how this works.

The deepfake training cycle

The training cycle is a cyclical process in which the model is continuously trained on images until stopped. The process can be broken down into four steps:

  • Encode faces into smaller intermediate representations.
  • Decode the intermediate representations back into faces.
  • Calculate the loss of (meaning, the difference between) the original face and the output of the model.
  • Modify (backpropagate) the models toward the correct answer.
Figure 1.2 – Diagram of the training cycle

Figure 1.2 – Diagram of the training cycle

In more detail, the process unfolds as follows:

  • The encoder’s job is to encode two different faces into an array, which we call the intermediate representation. The intermediate representation is much smaller than the original image size, with enough space to describe the lighting, pose, and expression of the faces. This process is similar to compression, where unnecessary data is thrown out to fit the data into a smaller space.
  • The decoder is actually a matched pair of models, which turn the intermediate representation back into faces. There is one decoder for each of the input faces, which is trained only on images of that one person’s face. This process tries to create a new face that matches the original face that was given to the encoder and encoded into the intermediate representation.
Figure 1.3 – Encoder and decoder

Figure 1.3 – Encoder and decoder

  • Loss is a score that is given to the auto-encoder based on how well it recreates the original faces. This is calculated by comparing the original image to the output from the encoder-decoder process. This comparison can be done in many ways, from a strict difference between them or something significantly more complicated that includes human perception as part of the calculation. No matter how it’s done, the result is the same: a number from 0 to 1, with 0 being the score for the model returning the exact same image and 1 being the exact opposite or the image. Most of the numbers will fall between 0 to 1. However, a perfect reconstruction (or its opposite) is impossible.

Note

The loss is where an auto-encoder differs from a GAN. In a GAN, the comparison loss is either replaced or supplemented with an additional network (usually an auto-encoder itself), which then produces a loss score of its own. The theory behind this structure is that the loss model (called a discriminator) can learn to get better at detecting the output of the generating model (called a generator) while the generator can learn to get better at fooling the discriminator.

  • Finally, there is backpropagation, a process in which the models are adjusted by following the path back through both the decoder and encoder that generated the face and nudging those paths toward the correct answer.
Figure 1.4 – Loss and backpropagation

Figure 1.4 – Loss and backpropagation

Once complete, the whole process starts back over at the encoder again. This continues to repeat until the neural network has finished training. The decision of when to end training can happen in several ways. It can happen when a certain number of repetitions have occurred (called iterations), when all the data has been gone through (called an epoch), or when the results meet a certain loss score.

Why not GANs?

GANs are one of the current darlings of generative networks. They are extremely popular and used extensively, being used particularly for super-resolution (intelligent upscaling), music generation, and even sometimes deepfakes. However, there are some reasons that they’re not used in all deepfake solutions.

GANs are popular due to their “imaginative” nature. They learn through the interaction of their generator and discriminator to fill in gaps in the data. Because they can fill in missing pieces, they are great at reconstruction tasks or at tasks where new data is required.

The ability of a GAN to create new data where it is missing is great for numerous tasks, but it has a critical flaw when used for deepfakes. In deepfakes, the goal is to replace one face with another face. An imaginative GAN would likely learn to fill the gaps in the data from one face with the data from the other. This leads to a problem that we call “identity bleed” where the two faces aren’t swapped properly; instead, they’re blended into a face that doesn’t look like either person, but a mix of the two.

This flaw in a GAN-created deepfake can be corrected or prevented but requires much more careful data collection and processing. In general, it’s easier to get a full swap instead of a blending by using a generative auto-encoder instead of a GAN.

The auto-encoder structure

Another name for an auto-encoder is an “hourglass” model. The reason for this is that each layer of an encoder is smaller than the layer before it while each layer of a decoder is larger than the one before. Because of this, the auto-encoder figure starts out large at the beginning, shrinks toward the middle, and then widens back out again as it reaches the end:

Figure 1.5 – Hourglass structure of an autoencoder

Figure 1.5 – Hourglass structure of an autoencoder

While these methods are flexible and have many potential uses, there are limitations. Let’s examine those limitations now.

 

Assessing the limitations of generative AI

Generative AIs like those used in deepfakes are not a panacea and actually have some significant limitations. However, by knowing about these limitations, they can generally be worked around or sidestepped with careful design.

Resolution

Deepfakes are limited in the resolution that they can swap. This is a hardware and time limitation: greater hardware and more time can provide higher resolution swaps. However, this is not a 1:1 linear growth. Doubling the resolution (from, say, 64x64 to 128x128) actually quadruples the amount of required VRAM – that is, the memory that a GPU has direct access to – and the time necessary to train is expanded a roughly equivalent amount. Because of this, resolution is often a balancing act, where you’ll want to make the deepfake the lowest resolution you can without sacrificing the results.

Training required for each face pair

To provide the best results, traditional deepfakes require that you train on every face pair that you wish to swap. This means that if you wanted to swap your own face with two of your friends, you’d have to train two separate models. This is because each model has one encoder and two decoders, which are trained only to swap the faces they were given.

There is a workaround to some multi-face swaps. In order to swap additional faces, you could write your own version with more than two decoders allowing you to swap additional faces. This is an imperfect solution, however, as each decoder takes up a significant amount of VRAM, requiring you to balance the number of faces carefully.

It may be better to simply train multiple pairs. By splitting the task up on multiple computers, you could train multiple models simultaneously, allowing you to create many face pairs at once.

Another option is to use a different type of AI face replacement. First Order Model (which is covered in the Looking at existing deepfake software section of this chapter) uses a different technique: instead of a paired approach, it uses AI to animate an image to match the actions of a replacement. This solution removes the need to retrain on each face pair, but comes at the cost of greatly reduced quality of the swap.

Training data

Generative AIs requires a significant amount of training data to accomplish their tasks. Sometimes, finding sufficient data or data of a high-enough quality is not possible. For example, how would someone create a deepfake of William Shakespeare when there are no videos or photographs of him? This is a tricky problem but can be worked around in several ways. While it is unfortunately impossible to create a proper deepfake of England’s greatest playwright, it would be possible to use an actor who looks like his portraits and then deepfake that actor as Shakespeare.

Tip

We will cover more on how to deal with poor or insufficient data in Chapter 3, Mastering Data.

Finding sufficient data (or clever workarounds) is the most difficult challenge that any data scientist faces. Occasionally, there simply is no way to get sufficient data. This is when you might need to re-examine the video to see whether there is another way to shoot it to avoid the lack of data, or you might try using other sources of similar data to patch the gaps. Sometimes, just knowing the limitations in advance can prevent a problem – other times, a workaround in the last minutes may be enough to save a project from failure.

While everyone should know the data limitations, knowing the limitations of the process is only for experts. If you are only looking to use deepfakes, you’ll probably use existing software. Let’s explore those next.

 

Looking at existing deepfake software

There have been many programs that have risen to fill the niche of deepfaking; however, few of them are still under development or supported. The rapid development of GPU hardware and AI software has led to unique challenges in software development, and many deepfake programs are no longer usable. However, there are still several deepfake software programs and, in this section, we’ll go over the major options.

Important Note

The authors have made every effort to be unbiased in this section, but are among the developers of Faceswap. Faceswap will be covered in more detail in Chapter 4, The Deepfake Workflow, with a walkthrough of the workflow of a deepfake through the Faceswap software.

Faceswap

Faceswap is a Free and Open Source (FOSS) software program for creating deepfakes. It’s released under the GPL3 and can be used by anyone anywhere. It’s written in Python and runs AI on the TensorFlow backend. It supports NVIDIA, AMD, and Apple GPUs for accelerating the machine learning models, or can be run on a CPU at a reduced speed. There are installers for Windows and Linux that can help by installing all the needed libraries and tools inside of a self-contained environment.

It’s available at https://Faceswap.dev/.

DeepFaceLab

Originally a fork of Faceswap, DeepFaceLab is now developed mostly by Ivan Perov. DeepFaceLab is another FOSS software program for deepfakes. It is known for more experimental models and features. There is no GUI, but there are Jupyter Notebooks that can be run in any of the Jupyter environments. There is also a DirectML version, which provides another option for people using Windows. There are fully contained builds that are packaged together into a single compressed file, which provides a fully working package for many operating systems.

It’s available at https://github.com/iperov/DeepFaceLab.

First Order Model

First Order Model works in a fundamentally different way from Faceswap and DeepFaceLab. Instead of swapping a face onto a new video, First Order Model “puppets” the face, making it match the movements of a video but leaving the face the same. Furthermore, it doesn’t require training on each face pair, making it easy to use to make quick deepfakes where you can “animate” a person even with just a single photo of them.

It is important to note that while the First Order Model software is available freely, it is licensed only for non-commercial use: if you want to use it in a commercial context, you’ll need to contact the author for a license. It’s available at https://github.com/AliaksandrSiarohin/first-order-model.

Reface

Reface is yet another method of creating deepfakes. Reface is closed source and proprietary, so we can’t analyze exactly how it works, but it uses a zero-shot learning method like First Order Model to swap faces without requiring training on a pair of swaps. Reface offers apps for Apple iOS and Android and does the swap in the cloud, making it easier to get a quick result, but means that you might not be able to swap the exact clip you want, and licensing may be an issue.

It’s available at https://reface.ai/.

 

Summary

The technology of deepfakes is not itself anything new or unique. These techniques existed in various forms long before they were applied to face-swapping, but deepfakes have caught public attention in a way that other AI techniques have never really been able to. There is something very visceral about seeing a face where it doesn’t belong, seeing an actor in a role you know that they didn’t play, or seeing your own face doing something you’ve never done.

While the techniques that make up deepfakes have all existed previously on their own, together, they provide completely new possibilities. There are numerous use cases that deepfakes can be applied to, from stunt-double replacement to advertising. The technology is here, and its use will only grow as more and more industries find ways to use it.

There are still limits to the capabilities of generative AI. Knowing what a deepfake cannot do is as important as knowing what it can do. Especially regarding data, knowing how to work around those limitations is key to a quality result.

We’ve given an overview of deepfakes, covering what they are, what they can be used for, how they work, their limitations, and the existing software you can use to make them. In the next chapter, we’ll cover the potential dangers of deepfakes and talk about the ethical questions that the technology brings with it.

About the Authors
Exploring Deepfakes
Unlock this book and the full library FREE for 7 days
Start now