Reader small image

You're reading from  Learning Bayesian Models with R

Product typeBook
Published inOct 2015
Reading LevelBeginner
PublisherPackt
ISBN-139781783987603
Edition1st Edition
Languages
Right arrow
Author (1)
Hari Manassery Koduvely
Hari Manassery Koduvely
author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely

Right arrow

Chapter 4. Machine Learning Using Bayesian Inference

Now that we have learned about Bayesian inference and R, it is time to use both for machine learning. In this chapter, we will give an overview of different machine learning techniques and discuss each of them in detail in subsequent chapters. Machine learning is a field at the intersection of computer science and statistics, and a subbranch of artificial intelligence or AI. The name essentially comes from the early works in AI where researchers were trying to develop learning machines that automatically learned the relationship between input and output variables from data alone. Once a machine is trained on a dataset for a given problem, it can be used as a black box to predict values of output variables for new values of input variables.

It is useful to set this learning process of a machine in a mathematical framework. Let X and Y be two random variables such that we seek a learning machine that learns the relationship between these...

Why Bayesian inference for machine learning?


We have already discussed the advantages of Bayesian statistics over classical statistics in the last chapter. In this chapter, we will see in more detail how some of the concepts of Bayesian inference that we learned in the last chapter are useful in the context of machine learning. For this purpose, we take one simple machine learning task, namely linear regression. Let us consider a learning task where we have a dataset D containing N pair of points and the goal is to build a machine learning model using linear regression that it can be used to predict values of , given new values of .

In linear regression, first, we assume that Y is of the following form:

Here, F(X) is a function that captures the true relationship between X and Y, and is an error term that captures the inherent noise in the data. It is assumed that this noise is characterized by a normal distribution with mean 0 and variance . What this implies is that if we have an infinite...

Model overfitting and bias-variance tradeoff


The expected loss mentioned in the previous section can be written as a sum of three terms in the case of linear regression using squared loss function, as follows:

Here, Bias is the difference between the true model F(X) and average value of taken over an ensemble of datasets. Bias is a measure of how much the average prediction over all datasets in the ensemble differs from the true regression function F(X). Variance is given by . It is a measure of extent to which the solution for a given dataset varies around the mean over all datasets. Hence, Variance is a measure of how much the function is sensitive to the particular choice of dataset D. The third term Noise, as mentioned earlier, is the expectation of difference between observation and the true regression function, over all the values of X and Y. Putting all these together, we can write the following:

The objective of machine learning is to learn the function from data that minimizes...

Selecting models of optimum complexity


There are different ways of selecting models with the right complexity so that the prediction error on unseen data is less. Let's discuss each of these approaches in the context of the linear regression model.

Subset selection

In the subset selection approach, one selects only a subset of the whole set of variables, which are significant, for the model. This not only increases the prediction accuracy of the model by decreasing model variance, but it is also useful from the interpretation point of view. There are different ways of doing subset selection, but the following two are the most commonly used approaches:

  • Forward selection: In forward selection, one starts with no variables (intercept alone), and by using a greedy algorithm, adds other variables one by one. For each step, the variable that most improves the fit is chosen to add to the model.

  • Backward selection: In backward selection, one starts with the full model and sequentially deletes the...

Bayesian averaging


So far, we have learned that simply minimizing the loss function (or equivalently maximizing the log likelihood function in the case of normal distribution) is not enough to develop a machine learning model for a given problem. One has to worry about models overfitting the training data, which will result in larger prediction errors on new datasets. The main advantage of Bayesian methods is that one can, in principle, get away from this problem, without using explicit regularization and different datasets for training and validation. This is called Bayesian model averaging and will be discussed here. This is one of the answers to our main question of the chapter, why Bayesian inference for machine learning?

For this, let's do a full Bayesian treatment of the linear regression problem. Since we only want to explain how Bayesian inference avoids the overfitting problem, we will skip all the mathematical derivations and state only the important results here. For more details...

An overview of common machine learning tasks


This section is a prequel to the following chapters, where we will discuss different machine learning techniques in detail. At a high level, there are only a handful of tasks that machine learning tries to address. However, for each of such tasks, there are several approaches and algorithms in place.

The typical tasks in any machine learning are one of the following:

  • Classification

  • Regression

  • Clustering

  • Association rules

  • Forecasting

  • Dimensional reduction

  • Density estimation

In classification, the objective is to assign a new data point to one of the predetermined classes. Typically, this is either a supervised or semi-supervised learning problem. The well-known machine learning algorithms used for classification are logistic regression, support vector machines (SVM), decision trees, Naïve Bayes, neural networks, Adaboost, and random forests. Here, Naïve Bayes is a Bayesian inference-based method. Other algorithms, such as logistic regression and neural...

References


  1. Friedman J., Hastie T., and Tibshirani R. The Elements of Statistical Learning – Data Mining, Inference, and Prediction. Springer Series in Statistics. 2009

  2. Bishop C.M. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer. 2006. ISBN-10: 0387310738

Summary


In this chapter, we got an overview of what machine learning is and what some of its high-level tasks are. We also discussed the importance of Bayesian inference in machine learning, particularly in the context of how it can help to avoid important issues, such as model overfit and how to select optimum models. In the coming chapters, we will learn some of the Bayesian machine learning methods in detail.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Bayesian Models with R
Published in: Oct 2015Publisher: PacktISBN-13: 9781783987603
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely