Reader small image

You're reading from  Modern Computer Vision with PyTorch

Product typeBook
Published inNov 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781839213472
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
V Kishore Ayyadevara
V Kishore Ayyadevara
author image
V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy
Yeshwanth Reddy
author image
Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

View More author details
Right arrow
Preface

Artificial Intelligence (AI) is here, and has become a powerful force and is fuelling some of the modern applications that are used on a daily basis. Much like the discovery/invention of fire, wheel, oil, electricity, and electronics - Artificial Intelligence is reshaping our world in ways that we could only fantasize about. AI has been historically a niche computer science subject, offered by a handful of labs. But because of the explosion of excellent theory, increase in computing power, and availability of data, the field started growing exponentially since the 2000s and has shown no sign of slowing down anytime soon.
AI has proven again and again that given the right algorithm and enough amount of data, it can learn the task by itself with limited human intervention and produce results that rival human judgement and sometimes even surpass them. Whether you are a rookie learning the ropes or a veteran driving large organizations, there is every reason to understand how AI works. Neural networks are some of the most flexible classes of Artificial Intelligence algorithms that have been adapted to a vast range of applications including structured data, text, and vision domains.

This book starts with the basics of neural networks and covers over 50 applications of computer vision. First, you will build a neural network (NN) from scratch using both NumPy, PyTorch, and then learn the best practices of tweaking a NN's hyper-parameters. As we progress, you will learn about CNNs, transfer-learning with a focus on classifying images. You will also learn about the practical aspects to take care of while building a NN model.

Next, you will learn about multi-object detection, segmentation, and implement them using R-CNN family, SSD, YOLO, U-Net, Mask-RCNN architectures. You will then learn to use the Detectron2 framework to simplify the process of building a NN for object detection and human-pose-estimation. Finally, you will implement 3-D object detection.

Subsequently, you will learn about auto-encoders and GANs with a strong focus on image manipulation and generation. Here, you will implement VAE, DCGAN, CGAN, Pix2Pix, CycleGan, StyleGAN2, SRGAN, Style-Transfer to manipulate images on a variety of tasks.

You will then learn to combine NLP and CV techniques while performing OCR, Image Captioning, object detection with transformers. Next, you will learn to combine RL with CV techniques to implement a self-driving car agent. Finally, you'll wrap up with moving a NN model to production and learn conventional CV techniques using the OpenCV library.

Who this book is for

This book is for newcomers to PyTorch and intermediate-level machine learning practitioners who are looking to become well versed in CV techniques using deep learning and PyTorch. Those who are just getting started with NNs will also find this book useful. Basic knowledge of the Python programming language and machine learning is all you need to get started with this book.

What this book covers

Chapter 1, Artificial Neural Network Fundamentals, gives you the complete details of how a neural network works. You will start by learning the key terminology associated with neural networks. Next, you will understand the working details of the building blocks and build a neural network from scratch on a toy dataset. By the end of this chapter, you will be confident about how a neural network works.

Chapter 2, PyTorch Fundamentals, introduces you to working with PyTorch. You will learn about the ways of creating and manipulating tensor objects before learning about the different ways of building a neural network model using PyTorch. You will still work with a toy dataset so that you understand the specifics of working with PyTorch.

Chapter 3, Building a Deep Neural Network with PyTorch, combines all that has been covered in the previous chapters to understand the impact of various neural network hyperparameters on model accuracy. By the end of this chapter, you will be confident about working with neural networks on a realistic dataset.

Chapter 4, Introducing Convolutional Neural Networks, details the challenges of using a vanilla neural network and you will be exposed to the reason why convolutional neural networks overcome the various limitations of traditional neural networks. You will dive deep into the working details of CNN and understand the various components in it. Next, you will learn the best practices of working with images. In this chapter, you will start working with real-world images and learn the intricacies of how CNNs help in image classification.

Chapter 5, Transfer Learning for Image Classification, exposes you to solving image classification problems in real-world. You will learn about multiple transfer learning architectures and also understand how it helps in significantly improving the image classification accuracy. Next, you will leverage transfer learning to implement the use cases of facial keypoint detection and age, gender estimation.

Chapter 6, Practical Aspects of Image Classification, provides insight into the practical aspects to take care of while building and deploying image classification models. You will practically see the advantages of leveraging data augmentation and batch normalization on real-world data. Further, you will learn about how class activation maps help in explaining the reason why CNN model predicted a certain outcome. By the end of this chapter, you can confidently tackle a majority of image classification problems and leverage the models discussed in the previous 3 chapters on your custom dataset.

Chapter 7, Basics of Object Detection, lays the foundation for object detection where you will learn about the various techniques that are used to build an object detection model. Next, you will learn about region proposal-based object-detection techniques through a use case where you will implement a model to locate trucks and buses in an image.

Chapter 8, Advanced Object Detection, exposes you to the limitations of the region-proposal based architectures. You will then learn about the working details of more advanced architectures that address the issues of region proposal-based architectures. You will implement all the architectures on the same dataset (trucks vs buses detection) so that you can contrast how each architecture works.

Chapter 9, Image Segmentation, builds upon the learnings in previous chapters and will help you build models that pin-point the location of the objects of various classes as well as instances of objects in an image. You will implement the use cases on images of a road and also on images of common household. By the end of this chapter, you will confidently tackle any image classification, object detection/ segmentation problem and solve it by building a model using PyTorch.

Chapter 10, Applications of Object Detection and Segmentation, sums up the learnings of all the previous chapters where you will implement object detection, segmentation in a few lines of code, implement models to perform human crowd counting and image colorization. Finally, you will also learn about how 3D object detection on a real-world dataset.

Chapter 11, Autoencoders and Image Manipulation, , lays the foundation for modifying an image. You will start by learning about various autoencoders that help in compressing an image and also generating novel images. Next, you will learn about adversarial attack that fools a model before implementing neural style transfer. Finally, you will implement an autoencoder to generate deep fake images.

Chapter 12, Image Generation Using GANs, starts by giving you a deep dive into how GANs work. Next, you will implement fake facial image generation as well as generating images of interest using GANs.

Chapter 13, Advanced GANs to Manipulate Images, takes image manipulation to the next level. You will implement GANs to convert objects from one class to another, generate images from sketches, and manipulate custom images so that we can generate an image in a specific style. By the end of this chapter, you can confidently perform image manipulation using a combination of autoencoders and GANs.

Chapter 14, Training with Minimal Data Points, lays the foundation where you will learn about leveraging other techniques in combination with computer vision techniques. You will also learn about classifying images from minimal and also zero training data points.

Chapter 15, Combining Computer Vision and NLP Techniques, gives you the working details of various NLP techniques like word embedding, LSTM, transformer, using which you will implement applications like image captioning, OCR, and object detection with transformers.

Chapter 16, Combining Computer Vision and Reinforcement Learning, starts by exposing you to the terminology of RL and also the way to assign value to a state. You will appreciate how RL and neural networks can be combined as you learn about Deep Q-Learning. With this learning, you will implement an agent to play the game of Pong and also an agent to implement a self-driving car.

Chapter 17, Moving a Model to Production, describes the best practices of moving a model to production. You will first learn about deploying a model on a local server before moving it to the AWS public cloud.

Chapter 18, Using OpenCV Utilities for Image Analysis, details the various OpenCV utilities to create 5 interesting applications. Through this chapter, you will learn about utilities that aid deep learning as well as utilities that can substitute deep learning in scenarios where there are considerable constraints on memory or speed of inference.

To get the most out of this book

Software/hardware covered in the book

OS requirements

Minimum 128 GB storage
Minimum 8 GB RAM
Intel i5 processor or better
NVIDIA 8+ GB graphics card – GTX1070 or better
Minimum 50 Mbps internet speed

Windows, Linux, and macOS

Python 3.6 and above

Windows, Linux, and macOS

PyTorch 1.7

Windows, Linux, and macOS

Google Colab (can run in any browser)

Windows, Linux, and macOS

Do note that almost all the code in the book can be run using Google Colab by clicking the Open Colab button in each of the notebooks for the chapters on GitHub.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Modern-Computer-Vision-with-PyTorch. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781839213472_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We are creating an object of the FMNISTDataset class named val, in addition to the train object that we saw earlier."

A block of code is set as follows:

# Crop image
img = img[50:250,40:240]
# Convert image to grayscale
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Show image
plt.imshow(img_gray, cmap='gray')

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

def accuracy(x, y, model):
model.eval() # <- let's wait till we get to dropout section
# get the prediction matrix for a tensor of `x` images
prediction = model(x)
# compute if the location of maximum in each row coincides
# with ground truth
max_values, argmaxes = prediction.max(-1)
is_correct = argmaxes == y
return is_correct.cpu().numpy().tolist()

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "We will apply gradient descent (after a feedforward pass) using one batch at a
time until we exhaust all data points within one epoch of training.
"

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Computer Vision with PyTorch
Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books &mdash; Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

author image
Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy