Training Deep Prediction Models

The previous chapters covered a bit of the theory behind neural networks and used some neural network packages in R. Now it is time to dive in and look at training deep learning models. In this chapter, we will explore how to train and build feedforward neural networks, which are the most common type of deep learning model. We will use MXNet to build deep learning models to perform classification and regression using a retail dataset.

This chapter will cover the following topics:

Getting started with deep feedforward neural networks
Common activation functions – rectifiers, hyperbolic tangent, and maxout
Introduction to the MXNet deep learning library
Use case – Using MXNet for classification and regression

Getting started with deep feedforward neural networks

A deep feedforward neural network is designed to approximate a function, f(), that maps some set of input variables, x, to an output variable, y. They are called feedforward neural networks because information flows from the input through each successive layer as far as the output, and there are no feedback or recursive loops (models including both forward and backward connections are referred to as recurrent neural networks).

Deep feedforward neural networks are applicable to a wide range of problems, and are particularly useful for applications such as image classification. More generally, feedforward neural networks are useful for prediction and classification where there is a clearly defined outcome (what digit an image contains, whether someone is walking upstairs or walking on a flat surface, the presence/absence of disease...

Activation functions

The activation function determines the mapping between input and a hidden layer. It defines the functional form for how a neuron gets activated. For example, a linear activation function could be defined as: f(x) = x, in which case the value for the neuron would be the raw input, x. A linear activation function is shown in the top panel of Figure 4.2. Linear activation functions are rarely used because in practice deep learning models would find it difficult to learn non-linear functional forms using linear activation functions. In previous chapters, we used the hyperbolic tangent as an activation function, namely f(x) = tanh(x). Hyperbolic tangent can work well in some cases, but a potential limitation is that at either low or high values, it saturates, as shown in the middle panel of the figure 4.2.

Perhaps the most popular activation function currently...

Introduction to the MXNet deep learning library

The deep learning libraries we will use in this book are MXNet, Keras, and TensorFlow. Keras is a frontend API, which means it is not a standalone library as it requires a lower-level library in the backend, usually TensorFlow. The advantage of using Keras rather than TensorFlow is that it has a simpler interface. We will use Keras in later chapters in this book.

Both MXNet and TensorFlow are multipurpose numerical computation libraries that can use GPUs for mass parallel matrix operations. As such, multi-dimensional matrices are central to both libraries. In R, we are familiar with the vector, which is a one-dimensional array of values of the same type. The R data frame is a two-dimensional array of values, where each column can have different types. The R matrix is a two-dimensional array of values with the same type. Some machine...

Use case – using MXNet for classification and regression

In this section, we will use a new dataset to create a binary classification task. The dataset we will use here is a transactional dataset that is available at https://www.dunnhumby.com/sourcefiles. This dataset has been made available from dunnhumby, which is perhaps best known for its link to the Tesco (a British grocery store) club-card, which is one of the largest retail loyalty systems in the world. I recommend the following book, which describes how dunnhumby helped Tesco to become the number one retailer by applying analytics to their retail loyalty program: Humby, Clive, Terry Hunt, and Tim Phillips. Scoring points. Kogan Page Publishers, 2008. Even though this book is relatively old, it remains one of the best use cases to describe how to roll out a business-transformation program based on data analytics...

Summary

We covered a lot of ground in this chapter. We looked at activation functions and built our first true deep learning models using MXNet. Then we took a real-life dataset and created two use cases for applying a machine learning model. The first use case was to predict which customers will return in the future based on their past activity. This was a binary classification task. The second use case was to predict how much a customer will spend in the future based on their past activity. This was a regression task. We ran both models first on a small dataset and used different machine learning libraries to compare them against our deep learning model. Our deep learning model out-performed all of the algorithms.

We then took this further by using a dataset that was 100 times bigger. We built a larger deep learning model and adjusted our parameters to get an increase in our...