Reader small image

You're reading from  Exploring Deepfakes

Product typeBook
Published inMar 2023
Reading LevelBeginner
PublisherPackt
ISBN-139781801810692
Edition1st Edition
Languages
Right arrow
Authors (2):
Bryan Lyon
Bryan Lyon
author image
Bryan Lyon

Bryan Lyon is a developer for Faceswap.
Read more about Bryan Lyon

Matt Tora
Matt Tora
author image
Matt Tora

Matt Tora is a developer for Faceswap.
Read more about Matt Tora

View More author details
Right arrow

The Future of Generative AI

While it sometimes might feel like we’re already living in the future with deepfakes and AI-generated images, the technology behind them is really just beginning to take off. As we move forward, the capabilities of these generative AIs will only become more powerful.

This chapter is not unbounded futurism but will instead look at specific generative AIs and where they are improving. We will examine the following technologies and how they are changing. We’ll discuss the future of the following areas of AI:

  • Generating text
  • Improving image quality
  • Text-guided image generation
  • Generating sound
  • Deepfakes

Generating text

Recently, text generation models made a major impact when they came into the public consciousness with OpenAI’s success with ChatGPT in 2022. However, text generation was among the first uses of AI. Eliza was the first chatbot ever developed, back in 1966, before all but the most technically inclined people had even seen a computer themselves. The personal computer wouldn’t even be invented for another 5 years, in 1971. However, it’s only recently that truly impressive chatbots have been developed.

Recent developments

A type of model called transformers is responsible for the recent burst in language models. Transformers are neural networks that are comprised entirely of a layer called an attention layer. Attention layers work sort of like a spotlight, focusing on the part of the data that is most likely to be important. This lets transformers (and other models that use attention layers) be a lot deeper without losing “focus”...

Improving image quality

The earliest known image that was taken through mechanical means is the Niépce héliographie, which was taken by Joseph Nicéphore Niépce in 1827. It was taken through the window of his workshop by exposing a pewter plate covered in a thin layer of a concoction made from lavender and bitumen. This plate was exposed to sunlight for several days to create a blurry, monochrome image.

Figure 9.1 – The Niépce Heliograph taken by Joseph Nicéphore Niépce in 1827

Figure 9.1 – The Niépce Heliograph taken by Joseph Nicéphore Niépce in 1827

Since then, images have gotten better at capturing reality, but the process has not been perfected. There are always various limitations that mean that the images aren’t quite accurate in color, cannot capture all the details, or cause distortions in the image. A truly perfect image is mindboggling to even consider: from a single perfect image, you’d be able to zoom into any atom even on the other side of the universe and our...

Text-guided image generation

Text-guided image generation is an interesting category of generative AI. OpenAI had several developers release a paper called Learning Transferable Visual Models From Natural Language Supervision (https://arxiv.org/abs/2103.00020). Though I prefer the summary title they posted on their blog CLIP: Connecting Text and Images. CLIP was mentioned in Chapter 8, Applying the Lessons of Deepfakes, but we’ll talk about it some more here.

CLIP

CLIP is actually a pair of neural network encoders. One is trained on images while the other is trained on text. So far, this isn’t very unusual. The real trick comes from how the two are linked. Essentially, both encoders are passed data from the same image; the image encoder gets the image, the text encoder gets the image’s description, and then the encoding they generate is compared to each other. This training methodology effectively trains two separate models to create the same output given...

Generating sound

Sound generation is another one of those fields we could keep subdividing down and down until all the room we have in the book is taken up by headings listing the different methods of sound generation. For the sake of brevity, we’ll group them all here and cover a few big subfields instead.

Voice swapping

The first thing that most people think about when they learn they can swap faces is the question of whether they can swap voices too. The answer is quite unsatisfying: yes, but you probably don’t want to. There are AIs out there that can swap voices but they all suffer from various problems: from sounding like a robot, to lacking inflection, to not matching the person involved, to being very expensive and exclusive. If you’re doing anything with even moderate production value, you’ll get much better use out of natural intelligence: finding an impressionist who can do an impersonation of the voice. AI technology is just not there ...

Deepfakes

Of course, this whole book has been about the past and present of deepfakes, so it makes sense to circle back to their future at the end. There is a lot we can learn about the future of deepfakes from the other AI mentioned in this chapter. This is because, sneakily, all the parts of this chapter have been building up to this section. Deepfakes are, after all, an image generation AI that works on domain-specific images with a shared embedding.

Every area that we’ve explored in this chapter can be used to improve deepfakes, so let’s approach them one at a time.

Sound generation

This one is quite simple and obvious. The next step after swapping a face would be to swap the voice too. If we could get a solid voice swap, then deepfakes would be taken to a whole new capability. Making music or other effects could also be useful if you were making a movie without any other people helping, but their utility would be otherwise limited (in deepfakes, other industries...

Summary

Generative AI has a huge history and a tremendous future. We’re standing before a vast plane where anything is possible, and we just have to go toward it. That said, not everything is visible today, and we must temper our expectations. The main challenges are the limitations of our computers, time, and research. If we dedicate our time and efforts to solving some of AI’s limitations, we’ll inevitably come up with brand-new leaps that will help us move forward. Even without huge revolutionary improvements though, there are a lot of smaller evolutionary improvements that we can make to improve the capabilities of these models.

The biggest driver of innovation is need. Having more and more people using generative AI and putting it toward novel uses will create the economic and social pushes that generative AI needs to continue being improved on into the future.

This book has all been about getting us to this point where we, the authors, can invite you...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Exploring Deepfakes
Published in: Mar 2023Publisher: PacktISBN-13: 9781801810692
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Bryan Lyon

Bryan Lyon is a developer for Faceswap.
Read more about Bryan Lyon

author image
Matt Tora

Matt Tora is a developer for Faceswap.
Read more about Matt Tora