You're reading from Modern Computer Vision with PyTorch

Product typeBook

Published inNov 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781839213472

Edition1st Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Authors (2):

V Kishore Ayyadevara

Yeshwanth Reddy

View More author details

Autoencoders and Image Manipulation

In the previous chapters, we have learned about classifying images, detecting objects in an image, and segmenting the pixels corresponding to objects in images. In this chapter, we will learn about representing an image in a lower dimension using autoencoders and leveraging the lower-dimensional representation of an image to generate new images by using variational autoencoders. Learning to represent images in a lower number of dimensions helps us manipulate (modify) the images to a considerable degree. We will learn about leveraging lower-dimensional representations to generate new images as well as novel images that are based on the content and style of two different images. Next, we will also learn about modifying images in such a way that the image is visually unaltered, however, the class corresponding to the image is changed from one...

Understanding autoencoders

So far, in the previous chapters, we have learned about classifying images by training a model based on the input image and its corresponding label. Now let's imagine a scenario where we need to cluster images based on their similarity and with the constraint of not having their corresponding labels. Autoencoders come in handy to identify and group similar images.

An autoencoder takes an image as input, stores it in a lower dimension, and tries to reproduce the same image as output, hence the term auto (which stands for being able to reproduce the input). However, if we just reproduce the input in the output, we would not need a network, but a simple multiplication of the input by 1 would do. The differentiating aspect of an autoencoder is that it encodes the information present in an image in a lower dimension and then reproduces the image, hence the term encoder (which stands for representing the information of an image in a lower dimension). This way...

Understanding convolutional autoencoders

In the previous section, we learned about autoencoders and implemented them in PyTorch. While we have implemented them, one convenience that we had through the dataset was that each image has only 1 channel (each image was represented as a black and white image) and the images are relatively small (28 x 28). Hence the network flattened the input and was able to train on 784 (28*28) input values to predict 784 output values. However, in reality, we will encounter images that have 3 channels and are much bigger than a 28 x 28 image.

In this section, we will learn about implementing a convolutional autoencoder that is able to work on multi-dimensional input images. However, for the purpose of comparison with vanilla autoencoders, we will work on the same MNIST dataset that we worked on in the previous section, but modify the network in such a way that we now build a convolutional autoencoder and not a vanilla autoencoder.

A convolutional autoencoder...

Understanding variational autoencoders

So far, we have seen a scenario where we can group similar images into clusters. Furthermore, we have learned that when we take embeddings of images that fall in a given cluster, we can re-construct (decode) them. However, what if an embedding (a latent vector) falls in between two clusters? There is no guarantee that we would generate realistic images. Variational autoencoders come in handy in such a scenario.

Before we dive into building a variational autoencoder, let's explore the limitations of generating images from embeddings that do not fall into a cluster (or in the middle of different clusters). First, we generate images by sampling vectors:

The following code is a continuation of the code built in the previous section, Understanding convolutional autoencoders, and is available as conv_auto_encoder.ipynb in the chapter11 folder of this book's GitHub repository - https://tinyurl.com/mcvp-packt

Calculate the latent vectors (embeddings...

Performing an adversarial attack on images

In the previous section, we learned about generating an image from random noise using a VAE. However, it was an unsupervised exercise. What if we want to modify an image in such a way that the change in image is so minimal that it is indistinguishable from the original image for a human, but still the neural network model perceives the object as belonging to a different class? Adversarial attacks on images come in handy in such a scenario.

Adversarial attacks refer to the changes that we make to input image values (pixels) so that we meet a certain objective.

In this section, we will learn about modifying an image slightly in such a way that the pre-trained models now predict them as a different class (specified by the user) and not the original class. The strategy we will adopt is as follows:

Provide an image of an elephant.
Specify the target class corresponding to the image.
Import a pre-trained model where the parameters of the model are...

Performing neural style transfer

In neural style transfer, we have a content image and a style image, and we combine these two images in such a way that the combined image preserves the content of the content image while maintaining the style of the style image.

An example style image and content image are as follows:

In the preceding picture, we want to retain the content in the picture on right (the content image), but overlay it with the color and texture in the picture on the left (the style image).

The process of performing neural style transfer is as follows. We try to modify the original image in a way that the loss value is split into content loss and style loss. Content loss refers to how different the generated image is from the content image. Style loss refers to how correlated the style image is to the generated image.

While we mentioned that the loss is calculated based on the difference in images, in practice, we modify it slightly by ensuring that the loss is calculated...

Generating deep fakes

We have learned about two different image-to-image tasks so far: semantic segmentation with UNet and image reconstruction with autoencoders. Deep fakery is an image-to-image task that has a very similar underlying theory.

Imagine a scenario where you want to create an application that takes a given image of a face and changes the facial expression in a way that you want. Deep fakes come in handy in this scenario. While we will not discuss the very latest in deep fakes in this book, techniques such as few-shot adversarial learning are developed to generate realistic images with the facial expression of interest. Knowledge of how deep fakes work and GANs (which you will learn about in the next chapters) will help you identify videos that are fake videos.

In the task of deep fakery, we would have a few hundred pictures of person A and a few hundred pictures of person B. The objective is to reconstruct person B's face with the facial expression of person A and vice...

Summary

In this chapter, we have learned about the different variants of autoencoders: vanilla, convolutional, and variational. We also learned about how the number of units in the bottleneck layer influences the reconstructed image. Next, we learned about identifying images that are similar to a given image using the t-SNE technique. We learned that when we sample vectors, we cannot get realistic images, and by using variational autoencoders, we learned about generating new images by using a combination of reconstruction loss and KL divergence loss. Next, we learned how to perform an adversarial attack on images to modify the class of an image while not changing the perceptive content of the image. Finally, we learned about leveraging the combination of content loss and gram matrix-based style loss to optimize for content and style loss of images to come up with an image that is a combination of two input images. Finally, we learned about tweaking an autoencoder to swap two faces without...

Questions

What is an encoder in an autoencoder?
What loss function does an autoencoder optimize for?
How do autoencoders help in grouping similar images?
When is a convolutional autoencoder useful?
Why do we get non-intuitive images if we randomly sample from vector space of embeddings obtained from vanilla/convolutional autoencoders?
What are the loss functions that VAEs optimize for?
How do VAEs overcome the limitation of vanilla/convolutional autoencoders to generate new images?
During an adversarial attack, why do we modify the input image pixels and not the weight values?

In a neural style transfer, what are the losses that we optimize for?
Why do we consider the activation of different layers and not the original image when calculating style and content loss?
Why do we consider gram matrix loss and not the difference between images when calculating style loss?
Why do we warp images while building a model to generate deep fakes?

The rest of the chapter is locked

You have been reading a chapter from

Modern Computer Vision with PyTorch

Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

Other recommended products

Related to this chapter

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

PyTorch Computer Vision Cookbook

This book enables you to solve the trickiest of problems in computer vision using deep learning algorithms and techniques. You will learn to use several different algorithms for different CV problems such as classification, detection, segmentation, and more using Pytorch. Packed with best practices in training and deployment of CV applications.

BookMar 2020364 pages

PyTorch Artificial Intelligence Fundamentals

In this book, you will start from the basics of tensor manipulation to all the way releasing your deep learning model to production. Using hands-on recipes you will learn to build deep learning applications and visualize the model performance. It teaches you about CNNs, RNNs, GANs and deep reinforcement learning with Pytorch.

BookFeb 2020200 pages

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

Generative Adversarial Networks Cookbook

Generative Adversarial Networks have opened up many new possibilities in the machine learning domain. This book is all you need to implement different types of GANs using TensorFlow and Keras, in order to provide optimized and efficient deep learning solutions.

BookDec 2018268 pages

Python Image Processing Cookbook

Advancements in wireless devices and mobile technology have enabled the acquisition of a tremendous amount of graphics, pictures, and videos. Through cutting edge recipes, this book provides coverage on tools, algorithms, and analysis for image processing. This book provides solutions addressing the challenges and complex tasks of image processing.

BookApr 2020438 pages

Mastering Computer Vision with TensorFlow 2.x

You will learn the principles of computer vision and deep learning, and understand various models and architectures with their pros and cons. You will learn how to use TensorFlow 2.x to build your own neural network model and apply it to various computer vision tasks such as image acquiring, processing, and analyzing.

BookMay 2020430 pages

Hands-On Generative Adversarial Networks with PyTorch 1.x

This book will help you understand how GANs architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working codes. You will apply GAN models to areas like computer vision, multimedia and natural language processing using a sample-generation perspective.

BookDec 2019312 pages

Applied Deep Learning with PyTorch

Starting with the basics of deep learning and their various applications, Applied Deep Learning with PyTorch shows you how to solve trending tasks, such as image classification and natural language processing by understanding the different architectures of the neural networks.

BookApr 2019254 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Deep Learning with PyTorch

This book provides the intuition behind the state of the art Deep Learning architectures such as ResNet, DenseNet, Inception, and encoder-decoder without diving deep into the math of it. It shows how you can implement and use various architectures to solve problems in the area of image classification, language translation and NLP using PyTorch.

BookFeb 2018262 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages