You're reading from Modern Computer Vision with PyTorch

Product typeBook

Published inNov 2020

Reading LevelBeginner

PublisherPackt

ISBN-139781839213472

Edition1st Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Authors (2):

V Kishore Ayyadevara

Yeshwanth Reddy

View More author details

Chapter 1 - Artificial Neural Network Fundamentals

What are the various layers in a neural network?
Input, Hidden, and Output Layers
What is the output of a feed-forward propagation?
Predictions that help in calculating loss value
How is the loss function of a continuous dependent variable different from that of a binary dependent variable and also of a categorical dependent variable?
MSE is the generally used loss function for a continuous dependent variable and binary cross-entropy for a binary dependent variable. Categorical cross-entropy is used for categorical dependent variables.
What is stochastic gradient descent?
It is a process of reducing loss, by adjusting weights in the direction of decreasing gradient
What does a backpropagation exercise do?
It computes gradients of all weights with respect to loss using the chain rule
How does the weight update of all the weights across layers happen during back-propagation?
It happens using the formula dW = W – alpha*(dW/dL)
What...

Chapter 2 - PyTorch Fundamentals

Why should we convert integer inputs into float values during training?
nn.Linear (and almost all torch layers) only accepts floats as inputs
What are the various methods to reshape a tensor object?
reshape, view
Why is computation faster with tensor objects over NumPy arrays?
Capability to run on GPUs in parallel is only available on tensor objects
What constitutes the init magic function in a neural network class?
Calling super().__init__() and specifying the neural network layers
Why do we perform zero gradients before performing back-propagation?
To ensure gradients from previous calculations are flushed out
What magic functions constitute the dataset class?
__len__ and __getitem__
How do we make predictions on new data points?
By calling the model on the tensor as if it is a function – model(x)
How do we fetch the intermediate layer values of a neural network?
By creating a custom method
How does the Sequential method help in simplifying defining...

Chapter 3 - Building a Deep Neural Network with PyTorch

What is the issue if the input values are not scaled in the input dataset?
It takes longer to adjust weights to optimal value because input values vary so widely when they are unscaled
What could be the issue if the background has a white pixel color while the content has a black pixel color when training a neural network?
The neural network has to learn to ignore a majority of the not so useful content that is white in color
What is the impact of batch size on the model's training time, accuracy over a given number of epochs?
The larger the batch size more is the time taken to converge and more iterations required to attain a high accuracy
What is the impact of the input value range on weight distribution at the end of the training?
If the input value is not scaled to a certain range, certain weights can aid in over-fitting
How does batch normalization help in improving accuracy?
Just like how it is important that we scale...

Chapter 4 - Introducing Convolutional Neural Networks

Why is the prediction on a translated image low when using traditional neural networks?
All images were centered in the original dataset, so the ANN learned the task for only centered images.
How is Convolution done?
Convolution is a multiplication between two matrices.
How are optimal weight values in a filter identified?
Through backpropagation.
How does the combination of convolution and pooling help in addressing the issue of image translation?
While convolution gives important image features, pooling takes the most prominent features in a patch of the image. This makes pooling a robust operation over the vicinity, i.e., even if something is translated by a few pixels, pooling will still return the expected output.
What do the filters in layers closer to the input layer learn?
Low-level features like edges.
What functionality does pooling do that helps in building a model?
It reduces input size by reducing feature map size and...

Chapter 5 - Transfer Learning for Image Classification

What are VGG and ResNet pre-trained architectures trained on?
The images in the Imagenet dataset.
Why does VGG11 have an inferior accuracy to VGG16?
VGG11 has fewer layers when compared to VGG16.
What does the number 11 in VGG11 represent?
11 layers.
What is residual in the residual network?
The layer returns input in addition to the layer's transformation.
What is the advantage of a residual network?
It helps in avoiding vanishing gradients and also helps in increasing model depth.
What are the various popular pre-trained models?
VGG, ResNet, Inception, AlexNet.
During transfer learning, why should images be normalized with the same mean and standard deviation as those which were used during training of a pre-trained model?
Models are trained such that they expect input images to be normalized with a specific mean and standard deviation.
Why do we freeze certain parameters in a model?
We freeze so that the parameters will not...

Chapter 6 - Practical Aspects of Image Classification

How are class activation maps obtained?
Refer to the 8 steps provided in the Generating CAMs section

How do batch normalization and data augmentation help when training a model?
They help reduce over-fitting

What are the common reasons why a CNN model overfits?
No batch normalization, data augmentation, dropout

What are the various scenarios where the CNN model works with training and validation data at the data scientists' end but not in the real world?
Real-world data can have a different distribution from the data used to train and validate the model. Additionally, the model might have over-fitted on training data

What are the various scenarios where we leverage OpenCV packages?
While working in constrained environments, and also when speed to infer is more important

Chapter 7 - Basics of Object Detection

How does the region proposal technique generate proposals?
It identifies regions that are similar in color, texture, size, and shape.
How is IoU calculated if there are multiple objects in an image?
IoU is calculated for each object with the ground truth, using Intersection Over Union metric
Why does R-CNN take a long time to generate predictions?
Because we create as many forward propagations as there are proposals
Why is Fast R-CNN faster when compared to R-CNN?
For all proposals, extracting the feature map from the VGG backbone is common. This reduces almost 90% of the computations as compared to Fast RCNN

How does RoI Pooling work?
All the selectivesearch crops are passed through adaptive pooling kernel so that the final output is of the same size
What is the impact of not having multiple layers, post obtaining feature map, when predicting the bounding box corrections?
You might not notice that the model did not learn to predict the bounding...

Chapter 8 - Advanced Object Detection

Why is Faster R-CNN faster when compared to Fast R-CNN?
We do not need to feed a lot of unnecessary proposals every time using the selectivesearch technique. Instead, Faster R-CNN automatically finds them using the region proposal network.
How are YOLO and SSD faster when compared to Faster R-CNN?
We don't need to rely on a new proposal network. The network directly finds the proposals in a single go.
What makes YOLO and SSD single shot detector algorithms?
Networks predict all the proposals and predictions in one shot
What is the difference between the objectness score and class score?
Objectness identifies if an object exists or not. But class score predicts what is the class for an anchor box whose objectness is non zero

Chapter 9 - Image Segmentation

How does up-scaling help in U-Net architecture?
Upscaling helps the feature map to increase in size so that the final output is the same size as the input size.
Why do we need to have a fully convolutional network in U-Net?
Because the outputs are also images, and it is difficult to predict an image shaped tensor using the Linear layer.
How does RoI Align improve over RoI pooling in Mask R-CNN?
RoI Align takes offsets of predicted proposals to fine-align the feature map.
What is the major difference between U-Net and Mask R-CNN for segmentation?
U-Net is fully convolutional and with a single end2end network, whereas Mask R-CNN uses mini networks such as Backbone, RPN, etc to do different tasks. Mask R-CNN is capable of identifying and separating several objects of the same type, but U-Net can only identify (but not separate them into individual instances).
What is instance segmentation?
If there are different objects of the same class in the same image...

Chapter 11 - Autoencoders and Image Manipulation

What is an encoder in autoencoder?
A smaller neural network that converts an image into a vector representation.
What loss function does autoencoder optimize for?
Pixel level mean square error, directly comparing prediction with input.
How do autoencoders help in grouping similar images?
Similar images will return similar encodings, which are easier to cluster.
When is the Convolutional autoencoder useful?
When the inputs are images.

Why do we get non-intuitive images if we randomly sample from vector space of embeddings obtained from vanilla/convolutional autoencoder?
The range of values in encodings is unconstrained, so proper outputs are highly dependent on the right range of values. Random sampling, in general, assumes a 0 mean and 1 standard deviation.
What are the loss functions that the Variational autoencoder optimizes for?
Pixel level MSE and KL-Divergence of the distribution of mean and standard deviation from the encoder.

Chapter 12 - Image Generation Using GANs

What happens if the learning rate of generator and discriminator models is high?
Empirically, it is observed that the model stability is lower.
In a scenario where the generator and discriminator are very well trained, what is the probability of a given image being real?
0.5.
Why do we use convtranspose2d in generating images?
We cannot upscale/ generate images using a linear layer.
Why do we have embeddings with high embedding size than the number of classes in Conditional GANs?
Using more parameters gives the model more degrees of freedom to learn the important features of each class.
How can we generate images of men that have a beard?
By using a conditional GAN. Just like we had male and female images, we can have bearded males and other such classes while training model.
Why do we have Tanh activation at the last layer in the generator and not ReLU or Sigmoid?
The pixel range of normalized images is [-1,1] and hence we use Tanh
Why did we...

Chapter 13 - Advanced GANs to Manipulate Images

Why do we need a Pix2Pix GAN where a supervised learning algorithm like U-Net could have worked to generate images from contours?
U-net only uses pixel-level loss during training. We needed pix2pix since there is no loss for realism when a U-net generates images.
Why do we need to optimize for 3 different loss functions in CycleGAN?
Answer provided in the 7 points in CycleGAN section.
How do the tricks leverage in ProgressiveGAN help in building a StyleGAN?
ProgressiveGAN helps the network to learn a few upsampling layers at a time so that when the image has to be increased in size, the networks responsible for generating current size images are optimal.
How do we identify latent vectors corresponding to a given custom image?
By adjusting the randomly generated noise in such a way that the MSE loss between the generated image and the image of interest is as minimal as possible.

Chapter 14 - Training with Minimal Data Points

How are pre-trained word vectors obtained?
From an existing database such as GLOVE or word2vec
How do we map from an image feature embedding to word embedding in Zero-shot learning?
By creating a suitable neural network that returns a vector of the same shape as word-embedding and training with mse-loss (comparing prediction with actual word-embedding)
Why is the Siamese network called so?
Because we always produce and compare two outputs with each other, for identicalness. Siamese stands for twins.
How does the Siamese network come up with the similarity between the two images?
The loss function forces the network to predict that the outputs have a smaller distance if the images are similar.

Chapter 15 - Combining Computer Vision and NLP Techniques

Why are CNN and RNN combined in image captioning?
CNN is needed for capturing image features, whereas, RNN is needed for creating the language output.
Why are start and end tokens provided in image captioning but not in handwritten transcription?
CTC loss does not need such tokens, and moreover, in OCR, we generate tokens in all time-steps in one shot.
Why is the CTC loss function leveraged in handwriting transcription?
We cannot delineate timesteps in the image. CTC takes care of aligning key image features with timesteps.
How do transformers help in object detection?
By treating anchor boxes as embedding inputs for transformer decoders DETR learns dynamic anchor boxes thereby helping object detection.

Chapter 16 - Combining Computer Vision and Reinforcement Learning

How is the value calculated for a given state?
By computing the expected reward at that state
How is the Q-table populated?
By computing expected reward for all states
Why do we have a discount factor in state action value calculation?
Due to uncertainty, we are unsure of how the future might work. Hence we reduce future rewards' weightage which is done by the way of discounting
What is the need for an exploration-exploitation strategy?
Only exploitation will make the model stagnant and predictable and hence model should be able to explore and find unseen steps that can be even more rewarding than what the model already has already learned.

What is the need for Deep Q-Learning?
We let the neural network learn the likely reward system without the need for costly algorithms that may take too much time or demand visibility of the entire environment.
How is the value of a given state action combination calculated using...

The rest of the chapter is locked

You have been reading a chapter from

Modern Computer Vision with PyTorch

Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

Other recommended products

Related to this chapter

Neural Networks with Keras Cookbook

This book presents solutions to the majority of the challenges you will face while training neural networks to solve deep learning problems. It covers the trending deep learning architectures used in industry and tackles a variety of use cases in computer vision, text processing, audio analysis, recommender systems, and game bots

BookFeb 2019568 pages

PyTorch Computer Vision Cookbook

This book enables you to solve the trickiest of problems in computer vision using deep learning algorithms and techniques. You will learn to use several different algorithms for different CV problems such as classification, detection, segmentation, and more using Pytorch. Packed with best practices in training and deployment of CV applications.

BookMar 2020364 pages

PyTorch Artificial Intelligence Fundamentals

In this book, you will start from the basics of tensor manipulation to all the way releasing your deep learning model to production. Using hands-on recipes you will learn to build deep learning applications and visualize the model performance. It teaches you about CNNs, RNNs, GANs and deep reinforcement learning with Pytorch.

BookFeb 2020200 pages

Generative Adversarial Networks Projects

In this book, we will use different complexities of datasets in order to build end-to-end projects. With every chapter, the level of complexity and operations will become advanced. It consists of 8 full-fledged projects covering approaches such as 3D-GAN, Age-cGAN, DCGAN, SRGAN, StackGAN, and CycleGAN with real-world use cases.

BookJan 2019316 pages

Generative Adversarial Networks Cookbook

Generative Adversarial Networks have opened up many new possibilities in the machine learning domain. This book is all you need to implement different types of GANs using TensorFlow and Keras, in order to provide optimized and efficient deep learning solutions.

BookDec 2018268 pages

Python Image Processing Cookbook

Advancements in wireless devices and mobile technology have enabled the acquisition of a tremendous amount of graphics, pictures, and videos. Through cutting edge recipes, this book provides coverage on tools, algorithms, and analysis for image processing. This book provides solutions addressing the challenges and complex tasks of image processing.

BookApr 2020438 pages

Mastering Computer Vision with TensorFlow 2.x

You will learn the principles of computer vision and deep learning, and understand various models and architectures with their pros and cons. You will learn how to use TensorFlow 2.x to build your own neural network model and apply it to various computer vision tasks such as image acquiring, processing, and analyzing.

BookMay 2020430 pages

Hands-On Generative Adversarial Networks with PyTorch 1.x

This book will help you understand how GANs architecture works using PyTorch. You will get familiar with the most flexible deep learning toolkit and use it to transform ideas into actual working codes. You will apply GAN models to areas like computer vision, multimedia and natural language processing using a sample-generation perspective.

BookDec 2019312 pages

Applied Deep Learning with PyTorch

Starting with the basics of deep learning and their various applications, Applied Deep Learning with PyTorch shows you how to solve trending tasks, such as image classification and natural language processing by understanding the different architectures of the neural networks.

BookApr 2019254 pages

Hands-On Deep Learning Algorithms with Python

This book introduces basic-to-advanced deep learning algorithms used in a production environment by AI researchers and principal data scientists; it explains algorithms intuitively, including the underlying math, and shows how to implement them using popular Python-based deep learning libraries such as TensorFlow.

BookJul 2019512 pages

Hands-On Image Generation with TensorFlow

This book is a step-by-step guide to show you how to implement generative models in TensorFlow 2.x from scratch. You’ll get to grips with the image generative technology by covering autoencoders, style transfer, and GANs as well as fundamental and state-of-the-art models.

BookDec 2020306 pages

Deep Learning with PyTorch

This book provides the intuition behind the state of the art Deep Learning architectures such as ResNet, DenseNet, Inception, and encoder-decoder without diving deep into the math of it. It shows how you can implement and use various architectures to solve problems in the area of image classification, language translation and NLP using PyTorch.

BookFeb 2018262 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages