Reader small image

You're reading from  Learning Bayesian Models with R

Product typeBook
Published inOct 2015
Reading LevelBeginner
PublisherPackt
ISBN-139781783987603
Edition1st Edition
Languages
Right arrow
Author (1)
Hari Manassery Koduvely
Hari Manassery Koduvely
author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely

Right arrow

Chapter 8. Bayesian Neural Networks

As the name suggests, artificial neural networks are statistical models built taking inspirations from the architecture and cognitive capabilities of biological brains. Neural network models typically have a layered architecture consisting of a large number of neurons in each layer, and neurons between different layers are connected. The first layer is called input layer, the last layer is called output layer, and the rest of the layers in the middle are called hidden layers. Each neuron has a state that is determined by a nonlinear function of the state of all neurons connected to it. Each connection has a weight that is determined from the training data containing a set of input and output pairs. This kind of layered architecture of neurons and their connections is present in the neocortex region of human brain and is considered to be responsible for higher functions such as sensory perception and language understanding.

The first computational model...

Two-layer neural networks


Let us look at the formal definition of a two-layer neural network. We follow the notations and description used by David MacKay (reference 1, 2, and 3 in the References section of this chapter). The input to the NN is given by . The input values are first multiplied by a set of weights to produce a weighted linear combination and then transformed using a nonlinear function to produce values of the state of neurons in the hidden layer:

A similar operation is done at the second layer to produce final output values :

The function is usually taken as either a sigmoid function or . Another common function used for multiclass classification is softmax defined as follows:

This is a normalized exponential function.

All these are highly nonlinear functions exhibiting the property that the output value has a sharp increase as a function of the input. This nonlinear property gives neural networks more computational flexibility than standard linear or generalized linear models...

Bayesian treatment of neural networks


To set the neural network learning in a Bayesian context, consider the error function for the regression case. It can be treated as a Gaussian noise term for observing the given dataset conditioned on the weights w. This is precisely the likelihood function that can be written as follows:

Here, is the variance of the noise term given by and represents a probabilistic model. The regularization term can be considered as the log of the prior probability distribution over the parameters:

Here, is the variance of the prior distribution of weights. It can be easily shown using Bayes' theorem that the objective function M(w) then corresponds to the posterior distribution of parameters w:

In the neural network case, we are interested in the local maxima of . The posterior is then approximated as a Gaussian around each maxima , as follows:

Here, A is a matrix of the second derivative of M(w) with respect to w and represents an inverse of the covariance matrix...

The brnn R package


The brnn package was developed by Paulino Perez Rodriguez and Daniel Gianola, and it implements the two-layer Bayesian regularized neural network described in the previous section. The main function in the package is brnn( ) that can be called using the following command:

>brnn(x,y,neurons,normalize,epochs,…,Monte_Carlo,…)

Here, x is an n x p matrix where n is the number of data points and p is the number of variables; y is an n dimensional vector containing target values. The number of neurons in the hidden layer of the network can be specified by the variable neurons. If the indicator function normalize is TRUE, it will normalize the input and output, which is the default option. The maximum number of iterations during model training is specified using epochs. If the indicator binary variable Monte_Carlo is true, then an MCMC method is used to estimate the trace of the inverse of the Hessian matrix A.

Let us try an example with the Auto MPG dataset that we used in Chapter...

Deep belief networks and deep learning


Some of the pioneering advancements in neural networks research in the last decade have opened up a new frontier in machine learning that is generally called by the name deep learning (references 5 and 7 in the References section of this chapter). The general definition of deep learning is, a class of machine learning techniques, where many layers of information processing stages in hierarchical supervised architectures are exploited for unsupervised feature learning and for pattern analysis/classification. The essence of deep learning is to compute hierarchical features or representations of the observational data, where the higher-level features or factors are defined from lower-level ones (reference 8 in the References section of this chapter). Although there are many similar definitions and architectures for deep learning, two common elements in all of them are: multiple layers of nonlinear information processing and supervised or unsupervised learning...

Exercises


  1. For the Auto MPG dataset, compare the performance of predictive models using ordinary regression, Bayesian GLM, and Bayesian neural networks.

References


  1. MacKay D. J. C. Information Theory, Inference and Learning Algorithms. Cambridge University Press. 2003. ISBN-10: 0521642981

  2. MacKayD. J. C. "The Evidence Framework Applied to Classification Networks". Neural Computation. Volume 4(3), 698-714. 1992

  3. MacKay D. J. C. "Probable Networks and Plausible Predictions – a review of practical Bayesian methods for supervised neural networks". Network: Computation in neural systems

  4. Hinton G. E., Rumelhart D. E., and Williams R. J. "Learning Representations by Back Propagating Errors". Nature. Volume 323, 533-536. 1986

  5. MacKay D. J. C. "Bayesian Interpolation". Neural Computation. Volume 4(3), 415-447. 1992

  6. Hinton G. E., Krizhevsky A., and Sutskever I. "ImageNet Classification with Deep Convolutional Neural Networks". Advances In Neural Information Processing Systems (NIPS). 2012

  7. Hinton G., Osindero S., and Teh Y. "A Fast Learning Algorithm for Deep Belief Nets". Neural Computation. 18:1527–1554. 2006

  8. Hinton G. and Salakhutdinov R. "Reducing the Dimensionality...

Summary


In this chapter, we learned about an important class of machine learning model, namely neural networks, and their Bayesian implementation. These models are inspired by the architecture of the human brain and they continue to be an area of active research and development. We also learned one of the latest advances in neural networks that is called deep learning. It can be used to solve many problems such as computer vision and natural language processing that involves highly cognitive elements. The artificial intelligent systems using deep learning were able to achieve accuracies comparable to human intelligence in tasks such as speech recognition and image classification. With this chapter, we have covered important classes of Bayesian machine learning models. In the next chapter, we will look at a different aspect: large scale machine learning and some of its applications in Bayesian models.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Bayesian Models with R
Published in: Oct 2015Publisher: PacktISBN-13: 9781783987603
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely