Reader small image

You're reading from  Caffe2 Quick Start Guide

Product typeBook
Published inMay 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789137750
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Ashwin Nanjappa
Ashwin Nanjappa
author image
Ashwin Nanjappa

Ashwin Nanjappa is a senior architect at NVIDIA, working in the TensorRT team on improving deep learning inference on GPU accelerators. He has a PhD from the National University of Singapore in developing GPU algorithms for the fundamental computational geometry problem of 3D Delaunay triangulation. As a post-doctoral research fellow at the BioInformatics Institute (Singapore), he developed GPU-accelerated machine learning algorithms for pose estimation using depth cameras. As an algorithms research engineer at Visenze (Singapore), he implemented computer vision algorithm pipelines in C++, developed a training framework built upon Caffe in Python, and trained deep learning models for some of the world's most popular online shopping portals.
Read more about Ashwin Nanjappa

Right arrow

Training Networks

In Chapter 2, Composing Networks, we learned how to create Caffe2 operators and how we can compose networks from them. In this chapter, the focus is on training neural networks. We will learn how to create a network that is intended for training and how to train it using Caffe2. We will continue to use the MNIST dataset as an example. However, instead of the MLP network we built in the previous chapter, we will create a popular network named LeNet.

This chapter will cover the following topics:

  • Introduction to training a neural network
  • Building the training network for LeNet
  • Training and monitoring the LeNet network

Introduction to training

In this section, we provide a brief overview of how a neural network is trained. This will help us to understand the later sections where we use Caffe2 to actually train a network.

Components of a neural network

We employ neural networks to solve a particular type of problem for which devising a computer algorithm would be onerous or difficult. For example, in the MNIST problem (introduced in Chapter 2, Composing Networks), handcrafting a complicated algorithm to detect the common stroke patterns for each digit, and thereby determining each digit, would be tedious. Instead, it is easier to design a neural network suited to this problem and then train it (as shown later in this chapter) using a lot...

LeNet network

In Chapter 2, Composing Networks, we built an MLP network that was composed of multiple pairs of fully connected layers and activation layers. In this chapter, we will build and train a convolutional neural network (CNN). This type of network is so named because it primarily uses convolution layers (introduced in the next section). For computer vision problems, CNNs have been shown to deliver better results with fewer numbers of parameters compared to MLPs. One of the first successful CNNs was used to solve the MNIST problem that we looked at earlier. This network, named LeNet-5, was created by Yann LeCun and his colleagues:

Figure 3.4: Structure of our LeNet model

We will construct a network similar in spirit to the LeNet. We will refer to this as the LeNet model in the remainder of this book. From Figure 3.4, we can see that our LeNet network has eight layers...

Training data

We use brew in this chapter to simplify the process of building our LeNet network. We begin by first initializing the model using ModelHelper, which was introduced in the previous chapter:

# Create the model helper for the train model
train_model = model_helper.ModelHelper(name="mnist_lenet_train_model")

We then add inputs to the training network using our add_model_inputs method:

# Specify the input is from the train lmdb
data, label = add_model_inputs(
train_model,
batch_size=64,
db=os.path.join(data_folder, "mnist-train-nchw-lmdb"),
db_type="lmdb",
)

Training data is usually stored in a database (DB) so that it can be accessed efficiently. Reading from a DB is usually faster than reading from thousands of individual files on the filesystem. For every training image in the MNIST dataset, the DB stores the grayscale pixel...

Building LeNet

We build the LeNet layers required for inference by calling the build_mnist_lenet method in our script:

# Build the LeNet network
softmax_layer = build_mnist_lenet(train_model, data)

Note how we only pass in the image pixel data input to this network and not the labels. The labels are not required for inference; they are required for training or testing to use as ground truth to compare against the prediction of the network’s final layer.

The remainder of the following subsections describe how we add pairs of convolution and pooling layers, the fully connected and ReLU layers, and the final SoftMax layer, to create the LeNet network.

Layer 1 – Convolution

The first layer in LeNet is a convolution...

Training layers

In earlier sections, we built the layers of a LeNet network required for inference and added inputs of image pixels and the label corresponding to each image. In this section, we are adding a few layers at the end of the network required to compute the loss function and for backpropagation. These layers are only required during training and can be discarded when using the trained network for inference.

Loss layer

As we noted in the Introduction to training section, we need a loss function at the end of the network to determine the error of the network. Caffe2 provides implementations of many common loss functions as operators in its operators' catalog.

For this example, we compute the loss value using...

Training and monitoring

We begin the training process by creating the network in the workspace and initializing all the parameter blobs of the network in the workspace. This is done by calling the workspace RunNetOnce method:

# The parameter initialization network only needs to be run once.
workspace.RunNetOnce(train_model.param_init_net)

Next, we ask Caffe2 to create the network in memory:

# Creating an actual network as a C++ object in memory.
# We need this as the object is going to be used a lot
# so we avoid creating an object every single time it is used.
workspace.CreateNet(train_model.net, overwrite=True)

We are finally ready to train. We iterate a predetermined number of times and, in each iteration, we use the workspace RunNet method to run a forward pass and a backward pass.

Training a small network such as our LeNet model is fast both on CPU and GPU. However, many of...

Summary

In this chapter, we learned about the general training process for a neural network using a gradient-based optimization algorithm. We learned about CNNs and the classic LeNet CNN to solve the MNIST problem. We built this network, and learned how to add training and test layers to it, so that we could use it for training. We finally used this network to train and learned how to monitor the network during training using Caffe2. In the following chapters, we will learn how to work with models trained using other frameworks, such as Caffe, TensorFlow, and PyTorch.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Caffe2 Quick Start Guide
Published in: May 2019Publisher: PacktISBN-13: 9781789137750
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashwin Nanjappa

Ashwin Nanjappa is a senior architect at NVIDIA, working in the TensorRT team on improving deep learning inference on GPU accelerators. He has a PhD from the National University of Singapore in developing GPU algorithms for the fundamental computational geometry problem of 3D Delaunay triangulation. As a post-doctoral research fellow at the BioInformatics Institute (Singapore), he developed GPU-accelerated machine learning algorithms for pose estimation using depth cameras. As an algorithms research engineer at Visenze (Singapore), he implemented computer vision algorithm pipelines in C++, developed a training framework built upon Caffe in Python, and trained deep learning models for some of the world's most popular online shopping portals.
Read more about Ashwin Nanjappa