You're reading from Modern Computer Vision with PyTorch

Product typeBook

Published inNov 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781839213472

Edition1st Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Authors (2):

V Kishore Ayyadevara

Yeshwanth Reddy

View More author details

Preface

Artificial Intelligence (AI) is here, and has become a powerful force and is fuelling some of the modern applications that are used on a daily basis. Much like the discovery/invention of fire, wheel, oil, electricity, and electronics - Artificial Intelligence is reshaping our world in ways that we could only fantasize about. AI has been historically a niche computer science subject, offered by a handful of labs. But because of the explosion of excellent theory, increase in computing power, and availability of data, the field started growing exponentially since the 2000s and has shown no sign of slowing down anytime soon.
AI has proven again and again that given the right algorithm and enough amount of data, it can learn the task by itself with limited human intervention and produce results that rival human judgement and sometimes even surpass them. Whether you are a rookie learning the ropes or a veteran driving large organizations, there is every reason to understand how AI works. Neural networks are some of the most flexible classes of Artificial Intelligence algorithms that have been adapted to a vast range of applications including structured data, text, and vision domains.

This book starts with the basics of neural networks and covers over 50 applications of computer vision. First, you will build a neural network (NN) from scratch using both NumPy, PyTorch, and then learn the best practices of tweaking a NN's hyper-parameters. As we progress, you will learn about CNNs, transfer-learning with a focus on classifying images. You will also learn about the practical aspects to take care of while building a NN model.

Next, you will learn about multi-object detection, segmentation, and implement them using R-CNN family, SSD, YOLO, U-Net, Mask-RCNN architectures. You will then learn to use the Detectron2 framework to simplify the process of building a NN for object detection and human-pose-estimation. Finally, you will implement 3-D object detection.

Subsequently, you will learn about auto-encoders and GANs with a strong focus on image manipulation and generation. Here, you will implement VAE, DCGAN, CGAN, Pix2Pix, CycleGan, StyleGAN2, SRGAN, Style-Transfer to manipulate images on a variety of tasks.

You will then learn to combine NLP and CV techniques while performing OCR, Image Captioning, object detection with transformers. Next, you will learn to combine RL with CV techniques to implement a self-driving car agent. Finally, you'll wrap up with moving a NN model to production and learn conventional CV techniques using the OpenCV library.

Who this book is for

This book is for newcomers to PyTorch and intermediate-level machine learning practitioners who are looking to become well versed in CV techniques using deep learning and PyTorch. Those who are just getting started with NNs will also find this book useful. Basic knowledge of the Python programming language and machine learning is all you need to get started with this book.

What this book covers

Chapter 1, Artificial Neural Network Fundamentals, gives you the complete details of how a neural network works. You will start by learning the key terminology associated with neural networks. Next, you will understand the working details of the building blocks and build a neural network from scratch on a toy dataset. By the end of this chapter, you will be confident about how a neural network works.

Chapter 2, PyTorch Fundamentals, introduces you to working with PyTorch. You will learn about the ways of creating and manipulating tensor objects before learning about the different ways of building a neural network model using PyTorch. You will still work with a toy dataset so that you understand the specifics of working with PyTorch.

Chapter 3, Building a Deep Neural Network with PyTorch, combines all that has been covered in the previous chapters to understand the impact of various neural network hyperparameters on model accuracy. By the end of this chapter, you will be confident about working with neural networks on a realistic dataset.

Chapter 4, Introducing Convolutional Neural Networks, details the challenges of using a vanilla neural network and you will be exposed to the reason why convolutional neural networks overcome the various limitations of traditional neural networks. You will dive deep into the working details of CNN and understand the various components in it. Next, you will learn the best practices of working with images. In this chapter, you will start working with real-world images and learn the intricacies of how CNNs help in image classification.

Chapter 5, Transfer Learning for Image Classification, exposes you to solving image classification problems in real-world. You will learn about multiple transfer learning architectures and also understand how it helps in significantly improving the image classification accuracy. Next, you will leverage transfer learning to implement the use cases of facial keypoint detection and age, gender estimation.

Chapter 6, Practical Aspects of Image Classification, provides insight into the practical aspects to take care of while building and deploying image classification models. You will practically see the advantages of leveraging data augmentation and batch normalization on real-world data. Further, you will learn about how class activation maps help in explaining the reason why CNN model predicted a certain outcome. By the end of this chapter, you can confidently tackle a majority of image classification problems and leverage the models discussed in the previous 3 chapters on your custom dataset.

Chapter 7, Basics of Object Detection, lays the foundation for object detection where you will learn about the various techniques that are used to build an object detection model. Next, you will learn about region proposal-based object-detection techniques through a use case where you will implement a model to locate trucks and buses in an image.

Chapter 8, Advanced Object Detection, exposes you to the limitations of the region-proposal based architectures. You will then learn about the working details of more advanced architectures that address the issues of region proposal-based architectures. You will implement all the architectures on the same dataset (trucks vs buses detection) so that you can contrast how each architecture works.

Chapter 9, Image Segmentation, builds upon the learnings in previous chapters and will help you build models that pin-point the location of the objects of various classes as well as instances of objects in an image. You will implement the use cases on images of a road and also on images of common household. By the end of this chapter, you will confidently tackle any image classification, object detection/ segmentation problem and solve it by building a model using PyTorch.

Chapter 10, Applications of Object Detection and Segmentation, sums up the learnings of all the previous chapters where you will implement object detection, segmentation in a few lines of code, implement models to perform human crowd counting and image colorization. Finally, you will also learn about how 3D object detection on a real-world dataset.

Chapter 11, Autoencoders and Image Manipulation, , lays the foundation for modifying an image. You will start by learning about various autoencoders that help in compressing an image and also generating novel images. Next, you will learn about adversarial attack that fools a model before implementing neural style transfer. Finally, you will implement an autoencoder to generate deep fake images.

Chapter 12, Image Generation Using GANs, starts by giving you a deep dive into how GANs work. Next, you will implement fake facial image generation as well as generating images of interest using GANs.

Chapter 13, Advanced GANs to Manipulate Images, takes image manipulation to the next level. You will implement GANs to convert objects from one class to another, generate images from sketches, and manipulate custom images so that we can generate an image in a specific style. By the end of this chapter, you can confidently perform image manipulation using a combination of autoencoders and GANs.

Chapter 14, Training with Minimal Data Points, lays the foundation where you will learn about leveraging other techniques in combination with computer vision techniques. You will also learn about classifying images from minimal and also zero training data points.

Chapter 15, Combining Computer Vision and NLP Techniques, gives you the working details of various NLP techniques like word embedding, LSTM, transformer, using which you will implement applications like image captioning, OCR, and object detection with transformers.

Chapter 16, Combining Computer Vision and Reinforcement Learning, starts by exposing you to the terminology of RL and also the way to assign value to a state. You will appreciate how RL and neural networks can be combined as you learn about Deep Q-Learning. With this learning, you will implement an agent to play the game of Pong and also an agent to implement a self-driving car.

Chapter 17, Moving a Model to Production, describes the best practices of moving a model to production. You will first learn about deploying a model on a local server before moving it to the AWS public cloud.

Chapter 18, Using OpenCV Utilities for Image Analysis, details the various OpenCV utilities to create 5 interesting applications. Through this chapter, you will learn about utilities that aid deep learning as well as utilities that can substitute deep learning in scenarios where there are considerable constraints on memory or speed of inference.

To get the most out of this book

Software/hardware covered in the book	OS requirements
Minimum 128 GB storage Minimum 8 GB RAM Intel i5 processor or better NVIDIA 8+ GB graphics card – GTX1070 or better Minimum 50 Mbps internet speed	Windows, Linux, and macOS
Python 3.6 and above	Windows, Linux, and macOS
PyTorch 1.7	Windows, Linux, and macOS
Google Colab (can run in any browser)	Windows, Linux, and macOS

Do note that almost all the code in the book can be run using Google Colab by clicking the Open Colab button in each of the notebooks for the chapters on GitHub.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Modern-Computer-Vision-with-PyTorch. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781839213472_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We are creating an object of the FMNISTDataset class named val, in addition to the train object that we saw earlier."

A block of code is set as follows:

# Crop image
img = img[50:250,40:240]
# Convert image to grayscale
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Show image
plt.imshow(img_gray, cmap='gray')

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

def accuracy(x, y, model):
    model.eval() # <- let's wait till we get to dropout section
    # get the prediction matrix for a tensor of `x` images
    prediction = model(x)
    # compute if the location of maximum in each row coincides
    # with ground truth
    max_values, argmaxes = prediction.max(-1)
    is_correct = argmaxes == y
    return is_correct.cpu().numpy().tolist()

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "We will apply gradient descent (after a feedforward pass) using one batch at a
time until we exhaust all data points within one epoch of training."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

The rest of the chapter is locked

You have been reading a chapter from

Modern Computer Vision with PyTorch

Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

Other recommended products

Related to this chapter

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

PyTorch Computer Vision Cookbook

This book enables you to solve the trickiest of problems in computer vision using deep learning algorithms and techniques. You will learn to use several different algorithms for different CV problems such as classification, detection, segmentation, and more using Pytorch. Packed with best practices in training and deployment of CV applications.

BookMar 2020364 pages

PyTorch Artificial Intelligence Fundamentals

In this book, you will start from the basics of tensor manipulation to all the way releasing your deep learning model to production. Using hands-on recipes you will learn to build deep learning applications and visualize the model performance. It teaches you about CNNs, RNNs, GANs and deep reinforcement learning with Pytorch.

BookFeb 2020200 pages

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

Generative Adversarial Networks Cookbook

Generative Adversarial Networks have opened up many new possibilities in the machine learning domain. This book is all you need to implement different types of GANs using TensorFlow and Keras, in order to provide optimized and efficient deep learning solutions.

BookDec 2018268 pages

Python Image Processing Cookbook

Advancements in wireless devices and mobile technology have enabled the acquisition of a tremendous amount of graphics, pictures, and videos. Through cutting edge recipes, this book provides coverage on tools, algorithms, and analysis for image processing. This book provides solutions addressing the challenges and complex tasks of image processing.

BookApr 2020438 pages

Mastering Computer Vision with TensorFlow 2.x

You will learn the principles of computer vision and deep learning, and understand various models and architectures with their pros and cons. You will learn how to use TensorFlow 2.x to build your own neural network model and apply it to various computer vision tasks such as image acquiring, processing, and analyzing.

BookMay 2020430 pages

Hands-On Generative Adversarial Networks with PyTorch 1.x

This book will help you understand how GANs architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working codes. You will apply GAN models to areas like computer vision, multimedia and natural language processing using a sample-generation perspective.

BookDec 2019312 pages

Applied Deep Learning with PyTorch

Starting with the basics of deep learning and their various applications, Applied Deep Learning with PyTorch shows you how to solve trending tasks, such as image classification and natural language processing by understanding the different architectures of the neural networks.

BookApr 2019254 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Deep Learning with PyTorch

This book provides the intuition behind the state of the art Deep Learning architectures such as ResNet, DenseNet, Inception, and encoder-decoder without diving deep into the math of it. It shows how you can implement and use various architectures to solve problems in the area of image classification, language translation and NLP using PyTorch.

BookFeb 2018262 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages