Reader small image

You're reading from  Deep Learning with MXNet Cookbook

Product typeBook
Published inDec 2023
Reading LevelBeginner
PublisherPackt
ISBN-139781800569607
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Andrés P. Torres
Andrés P. Torres
author image
Andrés P. Torres

Andrés P. Torres, is the Head of Perception at Oxa, a global leader in industrial autonomous vehicles, leading the design and development of State-Of The-Art algorithms for autonomous driving. Before, Andrés had a stint as an advisor and Head of AI at an early-stage content generation startup, Maekersuite, where he developed several AI-based algorithms for mobile phones and the web. Prior to this, Andrés was a Software Development Manager at Amazon Prime Air, developing software to optimize operations for autonomous drones.
Read more about Andrés P. Torres

Right arrow

Analyzing Images with Computer Vision

Computer vision is one of the fields in which deep learning has progressed enormously, surpassing human-level performance in several tasks such as image classification and object recognition. Furthermore, the field has moved from academia to real-world applications, and the industry is recognizing its practitioners as adding high value to businesses.

In this chapter, we will learn how to use GluonCV, a MXNet Gluon library specific to computer vision, how to build our own networks, and how to use GluonCV’s model zoo to use pretrained models for several applications.

Specifically, we will cover the following topics:

  • Understanding convolutional neural networks
  • Classifying images with AlexNet and ResNet
  • Detecting objects with Faster R-CNN and YOLO
  • Segmenting objects in images with PSPNet and DeepLab-v3

Technical requirements

Apart from the technical requirements specified in the Preface, the following technical requirements apply in this chapter:

  • Ensure that you have completed Installing MXNet, Gluon, GluonCV and GluonNLP, the first recipe from Chapter 1, Up and Running with MXNet
  • Ensure that you have completed A toy dataset for regression – load, manage, and visualize a house sales dataset, the first recipe from Chapter 2, Working with MXNet and Visualizing Datasets: Gluon and DataLoader

The code for this chapter can be found at the following GitHub URL: https://github.com/PacktPublishing/Deep-Learning-with-MXNet-Cookbook/tree/main/ch05.

Furthermore, you can access each recipe directly from Google Colab – for example, for the first recipe of this chapter: https://colab.research.google.com/github/PacktPublishing/Deep-Learning-with-MXNet-Cookbook/blob/main/ch05/5_1_Understanding_Convolutional_Neural_Networks.ipynb.

Understanding convolutional neural networks

In the previous chapters, we have used fully connected Multi-Layer Perceptron (MLP) networks to solve our regression and classification problem. However, as we will see, these networks are not optimal for solving image-related problems.

Images are highly dimensional entities – for example, each pixel in a color image has three features (red, green, and blue values), and a 1,024x1,024 image has more than 1 million pixels (a 1 megapixel image) and, therefore, more than 3 million features (3 * 106). If we connect all these points in the input layer, to a second layer of 100 neurons for a fully connected network, we will require more than 108 parameters, and that would be only for the first layer. Processing images is, therefore, a time-intensive operation.

Furthermore, imagine that we are trying to detect eyes in faces; if a pixel belongs to an eye, the likelihood of nearby pixels belonging to the eye is very high (think of the...

Classifying images with MXNet – GluonCV Model Zoo, AlexNet, and ResNet

MXNet provides a variety of tools to compose custom deep learning models. In this recipe, we will see how to use MXNet to build a model from scratch, train it, and use it to classify images from a dataset. We will also see that although this approach works fine, it is time-consuming.

Another option, and one of the highest value features that MXNet and GluonCV provide, is their Model Zoo. GluonCV Model Zoo is a set of pre-trained, ready-to-go models, for use with your own applications. We will see how to use Model Zoo with two very important models for image classification – AlexNet and ResNet.

In this recipe, we will analyze and compare these approaches to classify images on a reduced version of the Dogs vs. Cats dataset.

Getting ready

As with previous chapters, in this recipe, we will use a few matrix operations and linear algebra, but it will not be too difficult.

Furthermore, we will...

Detecting objects with MXNet – Faster R-CNN and YOLO

In this recipe, we will see how to use MXNet and GluonCV on a pre-trained model to detect objects from a dataset. We will see how to use GluonCV Model Zoo with two very important models for object detectionFaster R-CNN and YOLOv3.

In this recipe, we will compare the performance of these two pre-trained models to detect objects on the Penn-Fudan Pedestrians dataset.

Getting ready

As for previous chapters, in this recipe, we will be using a few matrix operations and linear algebra, but it will not be too difficult.

As we will unpack in this recipe, object detection combines classification and regression, and therefore, chapters and recipes where we explored the foundations of these topics are recommended to revisit. Furthermore, we will be detecting objects on image datasets. This recipe will combine what we learned in the following chapters:

  • Understanding image datasets: load, manage, and visualize...

Segmenting objects in images with MXNet – PSPNet and DeepLab-v3

In this recipe, we will see how to use MXNet and GluonCV on a pre-trained model, segmenting objects in images from a dataset. This means that we will be able to split objects into different classes, such as person, cat, and dog. When framing the problem as segmentation, the expected output is an image of the same size as the input image, with each pixel value being the classified label (we will analyze how this works in the following sections). We will see how to use GluonCV Model Zoo with two very important models for semantic segmentationPSPNet and DeepLab-v3.

In this recipe, we will compare the performance of these two pre-trained models to segment objects semantically on the dataset introduced in the previous chapter, Penn-Fudan Pedestrians, as its ground-truth also includes segmentation masks.

Getting ready

As with previous chapters, in this recipe, we will use a few matrix operations and...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning with MXNet Cookbook
Published in: Dec 2023Publisher: PacktISBN-13: 9781800569607
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Andrés P. Torres

Andrés P. Torres, is the Head of Perception at Oxa, a global leader in industrial autonomous vehicles, leading the design and development of State-Of The-Art algorithms for autonomous driving. Before, Andrés had a stint as an advisor and Head of AI at an early-stage content generation startup, Maekersuite, where he developed several AI-based algorithms for mobile phones and the web. Prior to this, Andrés was a Software Development Manager at Amazon Prime Air, developing software to optimize operations for autonomous drones.
Read more about Andrés P. Torres