Training a Prediction Model

This chapter shows you how to build and train basic neural networks in R through hands-on examples and shows how to evaluate different hyper-parameters for models to find the best set. Another important issue in deep learning is dealing with overfitting, which is when a model performs well on the data it was trained on but poorly on unseen data. We will briefly look at this topic in this chapter, and cover it in more depth in Chapter 3, Deep Learning Fundamentals. The chapter closes with an example use case classifying activity data from a smartphone as walking, going up or down stairs, sitting, standing, or lying down.

This chapter covers the following topics:

Neural networks in R
Binary classification
Visualizing a neural network
Multi-classification using the nnet and RSNNS packages
The problem of overfitting data—the consequences explained...

Neural networks in R

We will build several neural networks in this section. First, we will use the neuralnet package to create a neural network model that we can visualize. We will also use the nnet and RSNNS (Bergmeir, C., and Benítez, J. M. (2012)) packages. These are standard R packages and can be installed by the install.packages command or from the packages pane in RStudio. Although it is possible to use the nnet package directly, we are going to use it through the caret package, which is short for Classification and Regression Training. The caret package provides a standardized interface to work with many machine learning (ML) models in R, and also has some useful features for validation and performance assessment that we will use in this chapter and the next.

For our first examples of building neural networks, we will use the MNIST dataset, which is a classic classification...

The problem of overfitting data – the consequences explained

A common issue in machine learning is overfitting data. Generally, overfitting is used to refer to the phenomenon where the model performs better on the data used to train the model than it does on data not used to train the model (holdout data, future real use, and so on). Overfitting occurs when a model memorizes part of the training data and fits what is essentially noise in the training data. The accuracy in the training data is high, but because the noise changes from one dataset to the next, this accuracy does not apply to unseen data, that is, we can say that the model does not generalize very well.

Overfitting can occur at any time, but tends to become more severe as the ratio of parameters to information increases. Usually, this can be thought of as the ratio of parameters to observations, but not always...

Use case – building and applying a neural network

To close the chapter, we will discuss a more realistic use case for neural networks. We will use a public dataset by Anguita, D., Ghio, A., Oneto, L., Parra, X., and Reyes-Ortiz, J. L. (2013) that uses smartphones to track physical activity. The data can be downloaded at https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones. The smartphones had an accelerometer and gyroscope from which 561 features from both time and frequency were used.

The smartphones were worn during walking, walking upstairs, walking downstairs, standing, sitting, and lying down. Although this data came from phones, similar measures could be derived from other devices designed to track activity, such as various fitness-tracking watches or bands. So this data can be useful if we want to sell devices and have them automatically...

Summary

This chapter showed how to get started building and training neural networks to classify data, including image recognition and physical activity data. We looked at packages that can visualize a neural network and we created a number of models to perform classification on data with 10 different categories. Although we only used some neural network packages rather than deep learning packages, our models took a long time to train and we had issues with overfitting.

Some of the basic neural network models in this chapter took a long time to train, even though we did not use all the data available. For the MNIST data, we used approx. 8,000 rows for our binary classification task and only 6,000 rows for our multi-classification task. Even so, one model took almost an hour to train. Our deep learning models will be much more complicated and should be able to process millions...