Reader small image

You're reading from  Generative AI with Python and TensorFlow 2

Product typeBook
Published inApr 2021
PublisherPackt
ISBN-139781800200883
Edition1st Edition
Right arrow

Deepfakes with GANs

Manipulating videos and photographs to edit artifacts has been in practice for quite a long time. If you have seen movies like Forrest Gump or Fast and Furious 7, chances are you did not even notice that the scenes with John F. Kennedy or Paul Walker in their respective movies were fake and edited into the movies as required.

You may recall one particular scene from the movie Forrest Gump, where Gump meets John F. Kennedy. The scene was created using complex visual effects and archival footage to ensure high-quality results. Hollywood studios, spy agencies from across the world, and media outlets have been making use of editing tools such as Photoshop, After Effects, and complex custom visual effects/CGI (computer generated imagery) pipelines to come up with such compelling results. While the results have been more or less believable in most instances, it takes a huge amount of manual effort and time to edit each and every detail, such...

Deepfakes overview

Deepfakes is an all-encompassing term representing content generated using artificial intelligence (in particular, deep learning) that seems realistic and authentic to a human being. The generation of fake content or manipulation of existing content to suit the needs and agenda of the entities involved is not new. In the introduction, we discussed a few movies where CGI and painstaking manual effort helped in generating realistic results. With advancements in deep learning and, more specifically, generative models, it is becoming increasingly difficult to differentiate between what is real and what is fake.

Generative Adversarial Networks (GANs) have played a very important role in this space by enabling the generation of sharp, high-quality images and videos. Works such as https://thispersondoesnotexist.com, based on StyleGAN, have really pushed the boundaries in terms of the generation of high-quality realistic content. A number of other key architectures...

Modes of operation

Generating believable fake content requires taking care of multiple aspects to ensure that the results are as authentic as possible. A typical deepfake setup requires a source, a target, and the generated content.

  • The source, denoted with subscript s, is the driver identity that controls the required output
  • The target, denoted with subscript t, is the identity being faked
  • The generated content, denoted with subscript g, is the result following the transformation of the source to the target.

Now that we have some basic terminology in place, let's dive deeper and understand different ways of generating fake content.

Replacement

This is the most widely used form of generating fake content. The aim is to replace specific content of the target (xt) with that from the source (xs). Face replacement has been an active area of research for quite some time now. Figure 8.1 shows Donald Trump's face being replaced with Nicolas...

Key feature set

The human face and body are key entities in this task of fake content generation. While deep learning architectures usually do not require hand-crafted features, a little nudge goes a long way when complex entities are involved. Particularly when dealing with the human face, apart from detecting the overall face in a given image or video, a deepfake solution also needs to focus on the eyes, mouth, and other features. We discussed different modes of operation in the previous section, where we highlighted the importance of different sections of a face and their impact on improving the believability of the fake content generated.

In this section, we will briefly cover a few important features leveraged by different deepfake solutions. These are:

  • Facial Action Coding System (FACS)
  • 3D Morphable Model (3DMM)
  • Facial landmarks

We will also undertake a couple of hands-on exercises to better understand these feature sets.

Facial Action...

High-level workflow

Fake content generation is a complex task consisting of a number of components and steps that help in generating believable content. While this space is seeing quite a lot of research and hacks that improve the overall results, the setup can largely be explained using a few common building blocks. In this section, we will discuss a common high-level flow that describes how a deepfake setup uses data to train and generate fake content. We will also touch upon a few common architectures used in a number of works as basic building blocks.

As discussed earlier, a deepfake setup requires a source identity (xs) which drives the target identity (xt) to generate fake content (xg). To understand the high-level flow, we will continue with this notation, along with the concepts related to the key feature set discussed in the previous section. The steps are as follows:

  • Input processing
    • The input image (xs or xt) is processed using a face...

Replacement using autoencoders

Deepfakes are an interesting and powerful use of technology that is both useful and dangerous. In previous sections, we discussed different modes of operations and key features that can be leveraged, as well as common architectures. We also briefly touched upon the high-level flow of different tasks required to achieve the end results. In this section, we will focus on developing a face swapping setup using an autoencoder as our backbone architecture. Let's get started.

Task definition

The aim of this exercise is to develop a face swapping setup. As discussed earlier, face swapping is a type of replacement mode operation in the context of deepfake terminology. In this setup, we will focus on transforming Nicolas Cage (a Hollywood actor) into Donald J. Trump (former US president). In the upcoming sections, we will present each sub-task necessary for the preparation of data, training our models, and finally, the generation of swapped fake...

Re-enactment using pix2pix

Re-enactment is another mode of operation for the deepfakes setup. It is supposedly better at generating believable fake content compared to the replacement mode. In earlier sections, we discussed different techniques used to perform re-enactment, i.e. by focusing on gaze, expressions, the mouth, and so on.

We also discussed image-to-image translation architectures in Chapter 7, Style Transfer with GANs. Particularly, we discussed in detail how the pix2pix GAN is a powerful architecture which enables paired translation tasks. In this section, we will leverage the pix2pix GAN to develop a face re-enactment setup from scratch. We will work toward building a network where we can use our own face, mouth, and expressions to control Barack Obama's (former US president) face. We will go through each and every step, starting right from preparing the dataset, to defining the pix2pix architecture, to finally generating the output re-enactment. Let's...

Challenges

In this section, we will discuss some of the common challenges associated with deepfake architectures, beginning with a brief discussion on the ethical issues associated with this technology.

Ethical issues

Even though generating fake content is not a new concept, the word "deepfake" came into the limelight in 2017 when a Reddit user by the name u/deepfakes posted fake pornographic videos with celebrity faces superimposed on them using deep learning. The quality of the content and the ease with which the user was able to generate them created huge uproar on news channels across the globe. Soon, u/deepfakes released an easy-to-setup application called FakeApp that enabled users to generate such content with very little knowledge of how deep learning works. This led to a number of fake videos and objectionable content. This, in turn, helped people gain traction on issues associated with identity theft, impersonation, fake news, and so on.

Soon, interest...

Off-the-shelf implementations

In this chapter, we covered a step-by-step approach to developing two different deepfake architectures for replacement and re-enactment. Although the implementations are easy to understand and execute, they require quite a bit of understanding and resources to generate high-quality results.

Since the release of u/deepfakes' content in 2017, a number of open source implementations have come out to simplify the use of this technology. While dangerous, most of these projects highlight the ethical implications and caution developers and users in general against the malicious adoption of such projects. While it is beyond the scope of this chapter, we list a few well-designed and popular implementations in this section. Readers are encouraged to go through specific projects for more details.

  • FaceSwap19 The developers of this project claim this implementation is close to the original implementation by u/deepfakes, with enhancements over the...

Summary

Deepfakes are a complicated subject both ethically and technically. In this chapter, we discussed the deepfake technology in general to start with. We presented an overview of what deepfakes are all about and briefly touched upon a number of productive as well as malicious use cases. We presented a detailed discussion on different modes of operation of different deepfake setups and how each of these impacts the overall believability of generated content. While deepfakes is an all-encompassing term associated with videos, images, audio, text, and so on, we focused on visual use cases only in this chapter.

Given our scope, we discussed various feature sets leveraged by different works in this space. In particular, we discussed the Facial Action Coding System (FACS), 3D Morphable Models (3DMM), and facial landmarks. We also discussed how we can perform facial landmark detection using libraries such as dlib and MTCNN. We then presented a high-level flow of tasks to be performed...

References

  1. BuzzFeedVideo. (2018, April 17). You Won't Believe What Obama Says In This Video! ;) [Video]. YouTube. https://www.youtube.com/watch?v=cQ54GDm1eL0&ab_channel=BuzzFeedVideo
  2. Lee, D. (2019, May 10). Deepfake Salvador Dalí takes selfies with museum visitors. The Verge. https://www.theverge.com/2019/5/10/18540953/salvador-dali-lives-deepfake-museum
  3. Malaria Must Die. (2020). A World Without Malaria. Malaria Must Die. https://malariamustdie.com/
  4. Lyons, K. (2020, February 18). An Indian politician used AI to translate his speech into other languages to reach more voters. The Verge. https://www.theverge.com/2020/2/18/21142782/india-politician-deepfakes-ai-elections
  5. Dietmar, J. (2019, May 21). GANs And Deepfakes Could Revolutionize The Fashion Industry. Forbes. https://www.forbes.com/sites/forbestechcouncil/2019/05/21/gans-and-deepfakes-could-revolutionize-the-fashion-industry/?sh=2502d4163d17
  6. Statt, N. (2020, August 27). Ronald...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Generative AI with Python and TensorFlow 2
Published in: Apr 2021Publisher: PacktISBN-13: 9781800200883
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime