Implementation of a Deep Neural Network

We will now use our accumulated knowledge of GPU programming to implement our very own deep neural network (DNN) with PyCUDA. DNNs have attracted a lot of interest in the last decade, as they provide a robust and elegant model for machine learning (ML). DNNs was also one of the first applications (outside of rendering graphics) that were able to show the true power of GPUs by leveraging their massive parallel throughput, which ultimately helped NVIDIA rise to become a major player in the field of artificial intelligence.

In the course of this book, we have mostly been covering individual topics in a bubble on a chapter-by-chapter basis—here, we will build on many of the subjects we have learned about thus far for our very own implementation of a DNN. While there are several open source frameworks for GPU-based DNNs currently available...

Technical requirements

A Linux or Windows 10 PC with a modern NVIDIA GPU (2016—onward) is required for this chapter, with all of the necessary GPU drivers and the CUDA Toolkit (9.0–onward) installed. A suitable Python 2.7 installation (such as Anaconda Python 2.7) with the PyCUDA module is also required.

This chapter's code is also available on GitHub at https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA.

For more information about the prerequisites for this chapter, check out the preface of this book. For the software and hardware requirements, check out the README file in https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA.

Artificial neurons and neural networks

Let's briefly go over some of the basics of machine learning (ML) and neural networks (NNs). In Machine Learning, our goal is to take a collection of data with a particular set of labeled classes or characteristics and use these examples to train our system to predict the values of future data. We call a program or function that predicts classes or labels of future data based on prior training data a classifier.

There are many types of classifiers, but here we will be focusing on NNs. The idea behind NNs is that they (allegedly) work in a way that is similar to the human brain, in that they learn and classify data using a collection of artificial neurons (ANs), all connected together to form a particular structure. Let's step back for a moment, though, and look at what an individual AN is. In mathematics, this is just an affine...

Implementation of the softmax layer

We will now look at how we can implement a softmax layer. As we have already discussed, a sigmoid layer is used for assigning labels to a class—that is, if you want to have multiple nonexclusive characteristics that you want to infer from an input, you should use a sigmoid layer. A softmax layer is used when you only want to assign a single class to a sample by inference—this is done by computing a probability for each possible class (with probabilities over all classes, of course, summing to 100%). We can then select the class with the highest probability to give the final classification.

Now, let's see exactly what the softmax layer does—given a set of a collection of N real numbers (c₀, ..., c_N-1) , we first compute the sum of the exponential function on each number (), and then calculate the exponential of each...

Implementation of Cross-Entropy loss

Now, let's implement what is known as the cross-entropy loss function. This is used to measure how accurate an NN is on a small subset of data points during the training process; the bigger the value that is output by our loss function, the more inaccurate our NN is at properly classifying the given data. We do this by calculating a standard mean log-entropy difference between the expected output and the actual output of the NN. For numerical stability, we will limit the value of the output to 1:

MAX_ENTROPY = 1

def cross_entropy(predictions=None, ground_truth=None):
 
 if predictions is None or ground_truth is None:
  raise Exception("Error! Both predictions and ground truth must be float32 arrays")
 
 p = np.array(predictions).copy()
 y = np.array(ground_truth).copy()
 
 if p.shape != y.shape:
  raise Exception("Error! Both predictions...

Implementation of a sequential network

Now, let's implement one final class that will combine multiple dense layer and softmax layer objects into a single coherent feed-forward sequential neural network. This will be implemented as another class, which will subsume the other classes. Let's first start by writing the constructor—we will be able to set the max batch size here, which will affect how much memory is allocated for the use of this network – we'll store some allocated memory used for weights and input/output for each layer in the list variable, network_mem. We will also store the DenseLayer and SoftmaxLayer objects in the list network, and information about each layer in the NN in network_summary. Notice how we can also set up some training parameters here, including the delta, how many streams to use for gradient descent (we'll see this...

The Iris dataset

We will now construct our very own DNN for a real-life problem: classification of flower types based on the measurements of petals. We will be working with the well-known Iris dataset for this. This dataset is stored as a comma-separated value (CSV) text file, with each line containing four different numerical values (petal measurements), followed by the flower type (here, there are three classes—Irissetosa, Irisversicolor, and Irisvirginica). We will now design a small DNN that will classify the type of iris, based on this set.

Before we continue, please download the Iris dataset and put it into your working directory. This is available from the UC Irvine Machine Learning repository, which can be found here: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data.

We will start by processing this file into appropriate data arrays that...

Summary

In this chapter, we started by giving the definition of an artificial neural network, and showed you how individual ANs can be combined into dense layers, which combine together into a full-on deep neural network. We then implemented a dense layer in CUDA-C and made an appropriate corresponding Python wrapper class. We also included functionality to add ReLU and sigmoid layers on the outputs of a dense layer. We saw the definition and motivation of using a softmax layer, which is used for classification problems, and then implemented this in CUDA-C and Python. Finally, we implemented a Python class so that we could build a sequential feed-forward DNN from the prior classes; we implemented a cross-entropy loss function, and then used this in our loss function in our implementation of gradient descent to train the weights and biases in our DNN. Finally, we used our implementation...

Questions

Suppose you construct a DNN and after training it, it yields only garbage. After inspection, you find that all of the weights and biases are either huge numbers or NaNs. What might the problem be?
Name one possible problem with a small training_rate value.
Name one possible problem with a large training_rate value.
Suppose we want to train a DNN that will assign multiple labels to an image of an animal ("slimey", "furry", "red", "brown", and so on). Should we use a sigmoid or softmax layer at the end of the DNN?
Suppose we want to classify an image of a single animal as either a cat or dog. Do we use sigmoid or softmax?
If we decrease the batch size, will there be more or less updates to the weights and biases during gradient descent training?