Reader small image

You're reading from  Mastering Predictive Analytics with R

Product typeBook
Published inJun 2015
Reading LevelExpert
Publisher
ISBN-139781783982806
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Rui Miguel Forte
Rui Miguel Forte
author image
Rui Miguel Forte

Why do you think this reviewer is suitable for this book? Mr. Rui Miguel Forte has authored a book for Packt titled “Mastering Predictive Analytics with R”. The book has received a 5 star rating. He has 3 years experience as a Data Scientist. He has knowledge of Scala, Python, R, PHP. • Has the reviewer published any articles or blogs on this or a similar tool/technology ? [Provide Links and References] A brief of Unsupervised learning has been covered in his book “Mastering Predictive Analytics with R” https://www.safaribooksonline.com/library/view/mastering-predictive-analytics/9781783982806/ https://www.linkedin.com/profile/view?id=AAkAAAC5YUIBYL7LyLCWZ6LsR0ENJxByC2jU9AU&authType=NAME_SEARCH&authToken=c1Pg&locale=en_US&trk=tyah&trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A12149058%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1444032603690%2Ctas%3ARui%20Miguel%20Forte • Feedback on the Outline (in case outline has been shared with the reviewer) The author said the outline is good to go. • Did the reviewer share any concerns or questions regarding the reviewing process? (related to the schedule, commitment, or any additional comments) No
Read more about Rui Miguel Forte

Rui Miguel Forte
Rui Miguel Forte
author image
Rui Miguel Forte

Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master’s degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.
Read more about Rui Miguel Forte

View More author details
Right arrow

Chapter 4. Neural Networks

So far, we've looked at two of the most well-known methods used for predictive modeling. Linear regression is probably the most typical starting point for problems where the goal is to predict a numerical quantity. The model is based on a linear combination of input features. Logistic regression uses a nonlinear transformation of this linear feature combination in order to restrict the range of the output in the interval [0,1]. In so doing, it predicts the probability that the output belongs to one of two classes. Thus, it is a very well-known technique for classification.

Both methods share the disadvantage that they are not robust when dealing with many input features. In addition, logistic regression is typically used for the binary classification problem. In this chapter, we will introduce the concept of neural networks, a nonlinear approach to solving both regression and classification problems. They are significantly more robust when dealing with a higher...

The biological neuron


Neural network models draw their analogy from the organization of neurons in the human brain, and for this reason they are also often referred to as artificial neural networks (ANNs) to distinguish them from their biological counterparts. The key parallel is that a single biological neuron acts as a simple computational unit, but when a large number of these are combined together, the result is an extremely powerful and massively distributed processing machine capable of complex learning, known more commonly as the human brain. To get an idea of how neurons are connected in the brain, the following image shows a simplified picture of a human neural cell:

In a nutshell, we can think of a human neuron as a computational unit that takes in a series of parallel electrical signal inputs known as synaptic neurotransmitters coming in from the dendrites. The dendrites transmit signal chemicals to the soma or body of the neuron in response to the received synaptic neurotransmitters...

The artificial neuron


Using our biological analogy, we can construct a model of a computational neuron, and this model is known as the McCulloch-Pitts model of a neuron:

Note

Warren McCulloch and Walter Pitts proposed this model of a neural network as a computing machine in a paper titled A logical calculus of the ideas immanent in nervous activity, published by the Bulletin of Mathematical Biophysics in 1943.

This computational neuron is the simplest example of a neural network. We can construct the output function, y, of our neural network directly from following our diagram:

The function g() in our neural network is the activation function. Here, the specific activation function that is chosen is the step function:

When the linear weighted sum of inputs exceeds zero, the step function outputs 1, and when it does not, the function outputs -1. It is customary to create a dummy input feature x0 which is always taken to be 1, in order to merge the bias or threshold w0 into the main sum as follows...

Stochastic gradient descent


In the models we've seen so far, such as linear regression, we've talked about a criterion or objective function that the model must minimize while it is being trained. This criterion is also sometimes known as the cost function. For example, the least squares cost function for a model can be expressed as:

We've added a constant term of ½ in front of this for reasons that will become apparent shortly. We know from basic differentiation that when we are minimizing a function, multiplying the function by a constant factor does not alter the value of the minimum value of the function. In linear regression, just as with our perceptron model, our model's predicted is just the sum of a linear weighted combination of the input features. If we assume that our data is fixed and that the weights are variable and must be chosen so as to minimize our criterion, we can treat the cost function as being a function of the weights:

We have used the letter w to represent the model...

Multilayer perceptron networks


Multilayer neural networks are models that chain many neurons in order to create a neural architecture. Individually, neurons are very basic units, but when organized together, we can create a model significantly more powerful than the individual neurons.

As touched upon in the previous section, we build neural networks in layers and we distinguish between different kinds of neural networks primarily on the basis of the connections that exist between these layers and the types of neurons used. The following diagram shows the general structure of a multilayer perceptron (MLP) neural network, shown here for two hidden layers:

The first characteristic of the MLP network is that the information flows in a single direction from input layer to output layer. Thus, it is known as a feedforward neural network. This is in contrast to other neural network types, in which there are cycles that allow information to flow back to earlier neurons in the network as a feedback...

Predicting the energy efficiency of buildings


In this section, we will investigate how neural networks can be used to solve a real-world regression problem. Once again, we turn to the UCI Machine Learning Repository for our data set. We've chosen to try out the energy efficiency data set available at http://archive.ics.uci.edu/ml/datasets/Energy+efficiency. The prediction task is to use various building characteristics, such as surface area and roof area, in order to predict the energy efficiency of a building, which is expressed in the form of two different metrics—heating load and cooling load.

This is a good example for us to try out as we can demonstrate how neural networks can be used to predict two different outputs with a single network. The full attribute description of the data set is given in the following table:

Predicting glass type revisited


In Chapter 3, Logistic Regression, we analyzed the glass identification data set, whose task is to identify the type of glass comprising a glass fragment found at a crime scene. The output of this data set is a factor with several class levels corresponding to different types of glass. Our previous approach was to build a one-versus-all model using multinomial logistic regression. The results were not very promising, and one of the main points of concern was a poor model fit on the training data.

In this section, we will revisit this data set and see whether a neural network model can do better. At the same time, we will demonstrate how neural networks can handle classification problems as well:

> glass <- read.csv("glass.data", header = FALSE)
> names(glass) <- c("id", "RI", "Na", "Mg", "Al", "Si", "K", "Ca", 
                    "Ba", "Fe", "Type")
> glass$id <- NULL

Our output is a multiclass factor and so we will want to dummy-encode this...

Predicting handwritten digits


Our final application for neural networks will be the handwritten digit prediction task. In this task, the goal is to build a model that will be presented with an image of a numerical digit (0–9) and the model must predict which digit is being shown. We will use the MNIST database of handwritten digits from http://yann.lecun.com/exdb/mnist/.

From this page, we have downloaded and unzipped the two training files train-images-idx3-ubyte.gz and train-images-idx3-ubyte.gz. The former contains the data from the images and the latter contains the corresponding digit labels. The advantage of using this website is that the data has already been preprocessed by centering each digit in the image and scaling the digits to a uniform size. To load the data, we've used information from the website about the IDX format to write two functions:

read_idx_image_data <- function(image_file_path) {
  con <- file(image_file_path, "rb")
  magic_number <- readBin(con, what ...

Summary


In this chapter, we saw neural networks as a nonlinear method capable of solving both regression and classification problems. Motivated by the biological analogy to human neurons, we first introduced the simplest neural network, the perceptron. This is able to solve binary classification problems only when the two classes are linearly separable, something that we very rarely rely upon in practice.

By changing the function that transforms the linear weighted combination of inputs, namely the activation function, we discovered how to create different types of individual neurons. A linear activation function creates a neuron that performs linear regression, whereas the logistic activation function creates a neuron that performs logistic regression. By organizing and connecting neurons into layers, we can create multilayer neural networks that are powerful models for solving nonlinear problems.

The idea behind having hidden layers of neurons is that each hidden layer learns a new set of...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Predictive Analytics with R
Published in: Jun 2015Publisher: ISBN-13: 9781783982806
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Rui Miguel Forte

Why do you think this reviewer is suitable for this book? Mr. Rui Miguel Forte has authored a book for Packt titled “Mastering Predictive Analytics with R”. The book has received a 5 star rating. He has 3 years experience as a Data Scientist. He has knowledge of Scala, Python, R, PHP. • Has the reviewer published any articles or blogs on this or a similar tool/technology ? [Provide Links and References] A brief of Unsupervised learning has been covered in his book “Mastering Predictive Analytics with R” https://www.safaribooksonline.com/library/view/mastering-predictive-analytics/9781783982806/ https://www.linkedin.com/profile/view?id=AAkAAAC5YUIBYL7LyLCWZ6LsR0ENJxByC2jU9AU&authType=NAME_SEARCH&authToken=c1Pg&locale=en_US&trk=tyah&trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A12149058%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1444032603690%2Ctas%3ARui%20Miguel%20Forte • Feedback on the Outline (in case outline has been shared with the reviewer) The author said the outline is good to go. • Did the reviewer share any concerns or questions regarding the reviewing process? (related to the schedule, commitment, or any additional comments) No
Read more about Rui Miguel Forte

author image
Rui Miguel Forte

Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master’s degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.
Read more about Rui Miguel Forte

Column name

Type

Definition

relCompactness

Numerical

Relative compactness

surfArea

Numerical

Surface area

wallArea

Numerical

Wall area

roofArea

Numerical

...