Packt+ | Advance your knowledge in tech

You're reading from Deep Learning for Computer Vision

Product type Book

Published in Jan 2018

Publisher Packt

ISBN-13 9781788295628

Pages 310 pages

Edition 1st Edition

Languages

Python

Concepts

Deep Learning

Author (1):

Rajalingappaa Shanmugamani

Table of Contents (17) Chapters

Title Page

Packt Upsell

Foreword

Contributors

Preface

Getting Started

Image Classification

Image Retrieval

Object Detection

Semantic Segmentation

Similarity Learning

Image Captioning

Generative Models

Video Classification

Deployment

Other Books You May Enjoy

Leave a review - let other readers know what you think

Chapter 2. Image Classification

Image classification is the task of classifying a whole image as a single label. For example, an image classification task could label an image as a dog or a cat, given an image is either a dog or a cat. In this chapter, we will see how to use TensorFlow to build such an image classification model and also learn the techniques to improve the accuracy.

We will cover the following topics in this chapter:

Training the MNIST model in TensorFlow
Training the MNIST model in Keras
Other popular image testing datasets
The bigger deep learning models
Training a model for cats versus dogs
Developing real-world applications

Training the MNIST model in TensorFlow

In this section, we will learn about the Modified National Institute of Standards and Technology (MNIST) database data and build a simple classification model. The objective of this section is to learn the general framework for deep learning and use TensorFlow for the same. First, we will build a perceptron or logistic regression model. Then, we will train a CNN to achieve better accuracy. We will also see how TensorBoard helps visualize the training process and understand the parameters.

The MNIST datasets

The MNIST data has handwritten digits from 0–9 with 60,000 images for training and 10,000 images for testing. This database is widely used to try algorithms with minimum preprocessing. It's a good and compact database to learn machine learning algorithms. This is the most famous database for image classification problems. A few examples are shown here:

As can be seen in the preceding figure, there are 10 labels for these handwritten characters. The...

Training the MNIST model in Keras

In this section, we will use the same model as defined in the previous section using tf.keras APIs. It is better to learn both Keras and layers packages from TensorFlow as they could be seen at several open source codes. The objective of the book is to make you understand various offerings of TensorFlow so that you can build products on top of it.

"Code is read more often than it is written."

Bearing in mind the preceding quote, you are shown how to implement the same model using various APIs. Open source code of any implementation of the latest algorithms will be a mix of these APIs. Next, we will start with the Keras implementation.

Preparing the dataset

The MNIST data is available with Keras. First, import tensorflow. Then define a few constants such as batch size, the classes, and the number of epochs. The batch size can be selected based on the RAM available on your machine. The higher the batch size, the more RAM required. The impact of the batch size...

Other popular image testing datasets

The MNIST dataset is the most commonly used dataset for testing the algorithms. But there are other datasets that are used to test image classification algorithms.

The CIFAR dataset

The Canadian Institute for Advanced Research (CIFAR)-10 dataset has 60,000 images with 50,000 images for training and 10,000 images for testing. The number of classes is 10. The image dimension is 32 pixels by 32 pixels. The following are randomly selected images from each of the class:

The images are tiny and just contain one object. The CIFAR-100 dataset contains the same number of images but with 100 classes. Hence, there are only 600 images per class. Each image comes with a super label and a fine label. This dataset is available at tf.keras.datasets if you wish to experiment.

The Fashion-MNIST dataset

Fashion-MNIST is a dataset created as an alternative to the MNIST dataset. This dataset created as MNIST is considered as too easy and this can be directly replaced with MNIST...

The bigger deep learning models

We will go through several model definitions that have achieved state-of-the-art results in the ImageNet competitions. We will look at them individually on the following topics.

The AlexNet model

AlexNet is the first publication that started a wide interest in deep learning for computer vision. Krizhevsky et al. (https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) proposed AlexNet and it has been a pioneer and influential in this field. This model won the ImageNet 2013 challenge. The error rate was 15.4%, which was significantly better than the next. The model was relatively a simple architecture with five convolution layers. The challenge was to classify 1,000 categories of objects. The image and data had 15 million annotated images with over 22,000 categories. Out of them, only a 1,000 categories are used for the competition. AlexNet used ReLU as the activation function and found it was training several times...

Training a model for cats versus dogs

In this section, we will prepare and train a model for predicting cats versus dogs and understand some techniques which increase the accuracy. Most of the image classification problems come into this paradigm. Techniques covered in this section, such as augmentation and transfer learning, are useful for several problems.

Preparing the data

For the purpose of classification, we will download the data from kaggle and store in an appropriate format. Sign up and log in to www.kaggle.com and go to https://www.kaggle.com/c/dogs-vs-cats/data. Download the train.zip and test1.zip files from that page. The train.zip file contains 25,000 images of pet data. We will use only a portion of the data to train a model. Readers with more computing power, such as a Graphics Processing Unit (GPU), can use more data than suggested. Run the following script to rearrange the images and create the necessary folders:

import os
import shutil

work_dir = '' # give your correct directory...

Developing real-world applications

Recognizing cats and dogs is a cool problem but less likely a problem of importance. Real-world applications of image classification used in products may be different. You may have different data, targets, and so on. In this section, you will learn the tips and tricks to tackle such different settings. The factors that should be considered when approaching a new problem are as follows:

The number of targets. Is it a 10 class problem or 10,000 class problem?
How vast is the intra-class variance? For example, does the different type of cats have to be identified under one class label?
How vast is the inter-class variance? For example, do the different cats have to be identified?
How big is the data?
How balanced is the data?
Is there already a model that is trained with a lot of images?
What is the requisite for deployment inference time and model size? Is it 50 milliseconds on an iPhone or 10 milliseconds on Google Cloud Platform? How much RAM can be consumed...

Summary

We have covered basic, yet useful models for training classification tasks. We saw a simple model for an MNIST dataset with both Keras and TensorFlow APIs. We also saw how to utilize TensorBoard for watching the training process. Then, we discussed state-of-the-art architectures with some specific applications. Several ways to increase the accuracy such as data augmentation, training on bottleneck layers, and fine-tuning a pre-trained model were also covered. Tips and tricks to train models for new models were also presented.

In the next chapter, we will see how to visualize the deep learning models. We will also deploy the trained models in this chapter for inference. We will also see how to use the trained layers for the application of an image search through an application. Then, we will understand the concept of autoencoders and use it for the dimensionality of features.