Reader small image

You're reading from  Modern Computer Vision with PyTorch

Product typeBook
Published inNov 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781839213472
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
V Kishore Ayyadevara
V Kishore Ayyadevara
author image
V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

Yeshwanth Reddy
Yeshwanth Reddy
author image
Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy

View More author details
Right arrow

Chapter 1 - Artificial Neural Network Fundamentals

  1. What are the various layers in a neural network?
    Input, Hidden, and Output Layers
  2. What is the output of a feed-forward propagation?
    Predictions that help in calculating loss value
  3. How is the loss function of a continuous dependent variable different from that of a binary dependent variable and also of a categorical dependent variable?
    MSE is the generally used loss function for a continuous dependent variable and binary cross-entropy for a binary dependent variable. Categorical cross-entropy is used for categorical dependent variables.
  4. What is stochastic gradient descent?
    It is a process of reducing loss, by adjusting weights in the direction of decreasing gradient
  5. What does a backpropagation exercise do?
    It computes gradients of all weights with respect to loss using the chain rule
  6. How does the weight update of all the weights across layers happen during back-propagation?
    It happens using the formula dW = W – alpha*(dW/dL)
  7. What...

Chapter 2 - PyTorch Fundamentals

  1. Why should we convert integer inputs into float values during training?
    nn.Linear (and almost all torch layers) only accepts floats as inputs
  2. What are the various methods to reshape a tensor object?
    reshape, view
  3. Why is computation faster with tensor objects over NumPy arrays?
    Capability to run on GPUs in parallel is only available on tensor objects
  4. What constitutes the init magic function in a neural network class?
    Calling super().__init__() and specifying the neural network layers
  5. Why do we perform zero gradients before performing back-propagation?
    To ensure gradients from previous calculations are flushed out
  6. What magic functions constitute the dataset class?
    __len__ and __getitem__
  7. How do we make predictions on new data points?
    By calling the model on the tensor as if it is a function – model(x)
  8. How do we fetch the intermediate layer values of a neural network?
    By creating a custom method
  9. How does the Sequential method help in simplifying defining...

Chapter 3 - Building a Deep Neural Network with PyTorch

  1. What is the issue if the input values are not scaled in the input dataset?
    It takes longer to adjust weights to optimal value because input values vary so widely when they are unscaled
  2. What could be the issue if the background has a white pixel color while the content has a black pixel color when training a neural network?
    The neural network has to learn to ignore a majority of the not so useful content that is white in color
  3. What is the impact of batch size on the model's training time, accuracy over a given number of epochs?
    The larger the batch size more is the time taken to converge and more iterations required to attain a high accuracy
  4. What is the impact of the input value range on weight distribution at the end of the training?
    If the input value is not scaled to a certain range, certain weights can aid in over-fitting
  5. How does batch normalization help in improving accuracy?
    Just like how it is important that we scale...

Chapter 4 - Introducing Convolutional Neural Networks

  1. Why is the prediction on a translated image low when using traditional neural networks?
    All images were centered in the original dataset, so the ANN learned the task for only centered images.
  2. How is Convolution done?
    Convolution is a multiplication between two matrices.
  3. How are optimal weight values in a filter identified?
    Through backpropagation.
  4. How does the combination of convolution and pooling help in addressing the issue of image translation?
    While convolution gives important image features, pooling takes the most prominent features in a patch of the image. This makes pooling a robust operation over the vicinity, i.e., even if something is translated by a few pixels, pooling will still return the expected output.
  5. What do the filters in layers closer to the input layer learn?
    Low-level features like edges.
  6. What functionality does pooling do that helps in building a model?
    It reduces input size by reducing feature map size and...

Chapter 5 - Transfer Learning for Image Classification

  1. What are VGG and ResNet pre-trained architectures trained on?
    The images in the Imagenet dataset.
  2. Why does VGG11 have an inferior accuracy to VGG16?
    VGG11 has fewer layers when compared to VGG16.
  3. What does the number 11 in VGG11 represent?
    11 layers.
  4. What is residual in the residual network?
    The layer returns input in addition to the layer's transformation.
  5. What is the advantage of a residual network?
    It helps in avoiding vanishing gradients and also helps in increasing model depth.
  6. What are the various popular pre-trained models?
    VGG, ResNet, Inception, AlexNet.
  7. During transfer learning, why should images be normalized with the same mean and standard deviation as those which were used during training of a pre-trained model?
    Models are trained such that they expect input images to be normalized with a specific mean and standard deviation.
  8. Why do we freeze certain parameters in a model?
    We freeze so that the parameters will not...

Chapter 6 - Practical Aspects of Image Classification

  1. How are class activation maps obtained?
    Refer to the 8 steps provided in the Generating CAMs section
  1. How do batch normalization and data augmentation help when training a model?
    They help reduce over-fitting
  1. What are the common reasons why a CNN model overfits?
    No batch normalization, data augmentation, dropout
  1. What are the various scenarios where the CNN model works with training and validation data at the data scientists' end but not in the real world?
    Real-world data can have a different distribution from the data used to train and validate the model. Additionally, the model might have over-fitted on training data
  1. What are the various scenarios where we leverage OpenCV packages?
    While working in constrained environments, and also when speed to infer is more important

Chapter 7 - Basics of Object Detection

  1. How does the region proposal technique generate proposals?
    It identifies regions that are similar in color, texture, size, and shape.
  2. How is IoU calculated if there are multiple objects in an image?
    IoU is calculated for each object with the ground truth, using Intersection Over Union metric
  3. Why does R-CNN take a long time to generate predictions?
    Because we create as many forward propagations as there are proposals
  4. Why is Fast R-CNN faster when compared to R-CNN?
    For all proposals, extracting the feature map from the VGG backbone is common. This reduces almost 90% of the computations as compared to Fast RCNN
  1. How does RoI Pooling work?
    All the selectivesearch crops are passed through adaptive pooling kernel so that the final output is of the same size
  2. What is the impact of not having multiple layers, post obtaining feature map, when predicting the bounding box corrections?
    You might not notice that the model did not learn to predict the bounding...

Chapter 8 - Advanced Object Detection

  1. Why is Faster R-CNN faster when compared to Fast R-CNN?
    We do not need to feed a lot of unnecessary proposals every time using the selectivesearch technique. Instead, Faster R-CNN automatically finds them using the region proposal network.
  2. How are YOLO and SSD faster when compared to Faster R-CNN?
    We don't need to rely on a new proposal network. The network directly finds the proposals in a single go.
  3. What makes YOLO and SSD single shot detector algorithms?
    Networks predict all the proposals and predictions in one shot
  4. What is the difference between the objectness score and class score?
    Objectness identifies if an object exists or not. But class score predicts what is the class for an anchor box whose objectness is non zero

Chapter 9 - Image Segmentation

  1. How does up-scaling help in U-Net architecture?
    Upscaling helps the feature map to increase in size so that the final output is the same size as the input size.
  2. Why do we need to have a fully convolutional network in U-Net?
    Because the outputs are also images, and it is difficult to predict an image shaped tensor using the Linear layer.
  3. How does RoI Align improve over RoI pooling in Mask R-CNN?
    RoI Align takes offsets of predicted proposals to fine-align the feature map.
  4. What is the major difference between U-Net and Mask R-CNN for segmentation?
    U-Net is fully convolutional and with a single end2end network, whereas Mask R-CNN uses mini networks such as Backbone, RPN, etc to do different tasks. Mask R-CNN is capable of identifying and separating several objects of the same type, but U-Net can only identify (but not separate them into individual instances).
  5. What is instance segmentation?
    If there are different objects of the same class in the same image...

Chapter 11 - Autoencoders and Image Manipulation

  1. What is an encoder in autoencoder?
    A smaller neural network that converts an image into a vector representation.
  2. What loss function does autoencoder optimize for?
    Pixel level mean square error, directly comparing prediction with input.
  3. How do autoencoders help in grouping similar images?
    Similar images will return similar encodings, which are easier to cluster.
  4. When is the Convolutional autoencoder useful?
    When the inputs are images.
  1. Why do we get non-intuitive images if we randomly sample from vector space of embeddings obtained from vanilla/convolutional autoencoder?
    The range of values in encodings is unconstrained, so proper outputs are highly dependent on the right range of values. Random sampling, in general, assumes a 0 mean and 1 standard deviation.
  2. What are the loss functions that the Variational autoencoder optimizes for?
    Pixel level MSE and KL-Divergence of the distribution of mean and standard deviation from the encoder.
  3. ...

Chapter 12 - Image Generation Using GANs

  1. What happens if the learning rate of generator and discriminator models is high?
    Empirically, it is observed that the model stability is lower.
  2. In a scenario where the generator and discriminator are very well trained, what is the probability of a given image being real?
    0.5.
  3. Why do we use convtranspose2d in generating images?
    We cannot upscale/ generate images using a linear layer.
  4. Why do we have embeddings with high embedding size than the number of classes in Conditional GANs?
    Using more parameters gives the model more degrees of freedom to learn the important features of each class.
  5. How can we generate images of men that have a beard?
    By using a conditional GAN. Just like we had male and female images, we can have bearded males and other such classes while training model.
  6. Why do we have Tanh activation at the last layer in the generator and not ReLU or Sigmoid?
    The pixel range of normalized images is [-1,1] and hence we use Tanh
  7. Why did we...

Chapter 13 - Advanced GANs to Manipulate Images

  1. Why do we need a Pix2Pix GAN where a supervised learning algorithm like U-Net could have worked to generate images from contours?
    U-net only uses pixel-level loss during training. We needed pix2pix since there is no loss for realism when a U-net generates images.
  2. Why do we need to optimize for 3 different loss functions in CycleGAN?
    Answer provided in the 7 points in CycleGAN section.
  3. How do the tricks leverage in ProgressiveGAN help in building a StyleGAN?
    ProgressiveGAN helps the network to learn a few upsampling layers at a time so that when the image has to be increased in size, the networks responsible for generating current size images are optimal.
  4. How do we identify latent vectors corresponding to a given custom image?
    By adjusting the randomly generated noise in such a way that the MSE loss between the generated image and the image of interest is as minimal as possible.

Chapter 14 - Training with Minimal Data Points

  1. How are pre-trained word vectors obtained?
    From an existing database such as GLOVE or word2vec
  2. How do we map from an image feature embedding to word embedding in Zero-shot learning?
    By creating a suitable neural network that returns a vector of the same shape as word-embedding and training with mse-loss (comparing prediction with actual word-embedding)
  3. Why is the Siamese network called so?
    Because we always produce and compare two outputs with each other, for identicalness. Siamese stands for twins.
  4. How does the Siamese network come up with the similarity between the two images?
    The loss function forces the network to predict that the outputs have a smaller distance if the images are similar.

Chapter 15 - Combining Computer Vision and NLP Techniques

  1. Why are CNN and RNN combined in image captioning?
    CNN is needed for capturing image features, whereas, RNN is needed for creating the language output.
  2. Why are start and end tokens provided in image captioning but not in handwritten transcription?
    CTC loss does not need such tokens, and moreover, in OCR, we generate tokens in all time-steps in one shot.
  3. Why is the CTC loss function leveraged in handwriting transcription?
    We cannot delineate timesteps in the image. CTC takes care of aligning key image features with timesteps.
  4. How do transformers help in object detection?
    By treating anchor boxes as embedding inputs for transformer decoders DETR learns dynamic anchor boxes thereby helping object detection.

Chapter 16 - Combining Computer Vision and Reinforcement Learning

  1. How is the value calculated for a given state?
    By computing the expected reward at that state
  2. How is the Q-table populated?
    By computing expected reward for all states
  3. Why do we have a discount factor in state action value calculation?
    Due to uncertainty, we are unsure of how the future might work. Hence we reduce future rewards' weightage which is done by the way of discounting
  4. What is the need for an exploration-exploitation strategy?
    Only exploitation will make the model stagnant and predictable and hence model should be able to explore and find unseen steps that can be even more rewarding than what the model already has already learned.
  1. What is the need for Deep Q-Learning?
    We let the neural network learn the likely reward system without the need for costly algorithms that may take too much time or demand visibility of the entire environment.
  2. How is the value of a given state action combination calculated using...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Computer Vision with PyTorch
Published in: Nov 2020Publisher: PacktISBN-13: 9781839213472
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
V Kishore Ayyadevara

V Kishore Ayyadevara leads a team focused on using AI to solve problems in the healthcare space. He has 10 years' experience in data science, solving problems to improve customer experience in leading technology companies. In his current role, he is responsible for developing a variety of cutting edge analytical solutions that have an impact at scale while building strong technical teams. Prior to this, Kishore authored three books — Pro Machine Learning Algorithms, Hands-on Machine Learning with Google Cloud Platform, and SciPy Recipes. Kishore is an active learner with keen interest in identifying problems that can be solved using data, simplifying the complexity and in transferring techniques across domains to achieve quantifiable results.
Read more about V Kishore Ayyadevara

author image
Yeshwanth Reddy

Yeshwanth is a highly accomplished data scientist manager with 9+ years of experience in deep learning and document analysis. He has made significant contributions to the field, including building software for end-to-end document digitization, resulting in substantial cost savings. Yeshwanth's expertise extends to developing modules in OCR, word detection, and synthetic document generation. His groundbreaking work has been recognized through multiple patents. He also created a few Python libraries. With a passion for disrupting unsupervised and self-supervised learning, Yeshwanth is dedicated to reducing reliance on manual annotation and driving innovative solutions in the field of data science.
Read more about Yeshwanth Reddy