Packt+ | Advance your knowledge in tech

You're reading from Learning Bayesian Models with R

Product typeBook

Published inOct 2015

Reading LevelBeginner

PublisherPackt

ISBN-139781783987603

Edition1st Edition

Languages

Concepts

Machine Learning

Author (1)

Hari Manassery Koduvely

Chapter 5. Bayesian Regression Models

In the previous chapter, we covered the theory of Bayesian linear regression in some detail. In this chapter, we will take a sample problem and illustrate how it can be applied to practical situations. For this purpose, we will use the generalized linear model (GLM) packages in R. Firstly, we will give a brief introduction to the concept of GLM to the readers.

Generalized linear regression

Recall that in linear regression, we assume the following functional form between the dependent variable Y and independent variable X:

Here, is a set of basis functions and is the parameter vector. Usually, it is assumed that , so represents an intercept or a bias term. Also, it is assumed that is a noise term distributed according to the normal distribution with mean zero and variance . We also showed that this results in the following equation:

One can generalize the preceding equation to incorporate not only the normal distribution for noise but any distribution in the exponential family (reference 1 in the References section of this chapter). This is done by defining the following equation:

Here, g is called a link function. The well-known models, such as logistic regression, log-linear models, Poisson regression, and so on, are special cases of GLM. For example, in the case of ordinary linear regression, the link function would be . For logistic regression...

The arm package

In this chapter, for the purpose of illustrating Bayesian regression models, we will use the arm package of R. This package was developed by Andrew Gelman and co-workers, and it can be downloaded from the website at http://CRAN.R-project.org/package=arm.

The arm package has the bayesglm function that implements the Bayesian generalized linear model with an independent normal, t, or Cauchy prior distributions, for the model coefficients. We will use this function to build Bayesian regression models.

The Energy efficiency dataset

We will use the Energy efficiency dataset from the UCI Machine Learning repository for the illustration of Bayesian regression (reference 2 in the References section of this chapter). The dataset can be downloaded from the website at http://archive.ics.uci.edu/ml/datasets/Energy+efficiency. The dataset contains the measurements of energy efficiency of buildings with different building parameters. There are two energy efficiency parameters measured: heating load (Y1) and cooling load (Y2).

The building parameters used are: relative compactness (X1), surface area (X2), wall area (X3), roof area (X4), overall height (X5), orientation (X6), glazing area (X7), and glazing area distribution (X8). We will try to predict heating load as a function of all the building parameters using both ordinary regression and Bayesian regression, using the glm functions of the arm package. We will show that, for the same dataset, Bayesian regression gives significantly smaller prediction...

Regression of energy efficiency with building parameters

In this section, we will do a linear regression of the building's energy efficiency measure, heating load (Y1) as a function of the building parameters. It would be useful to do a preliminary descriptive analysis to find which building variables are statistically significant. For this, we will first create bivariate plots of Y1 and all the X variables. We will also compute the Spearman correlation between Y1 and all the X variables. The R script for performing these tasks is as follows:

>library(ggplot2)
>library(gridExtra)

>df <- read.csv("ENB2012_data.csv",header = T)
>df <- df[,c(1:9)]
>str(df)
>df[,6] <- as.numeric(df[,6])
>df[,8] <- as.numeric(df[,8])

>attach(df)
>bp1 <- ggplot(data = df,aes(x = X1,y = Y1)) + geom_point()
>bp2 <- ggplot(data = df,aes(x = X2,y = Y1)) + geom_point()
>bp3 <- ggplot(data = df,aes(x = X3,y = Y1)) + geom_point()
>bp4 <- ggplot(data = df,aes...

Simulation of the posterior distribution

If one wants to find out the posterior of the model parameters, the sim( ) function of the arm package becomes handy. The following R script will simulate the posterior distribution of parameters and produce a set of histograms:

>posterior.bayes <- as.data.frame(coef(sim(fit.bayes)))
>attach(posterior.bayes)

>h1 <- ggplot(data = posterior.bayes,aes(x = X1)) + geom_histogram() + ggtitle("Histogram X1")
>h2 <- ggplot(data = posterior.bayes,aes(x = X2)) + geom_histogram() + ggtitle("Histogram X2")
>h3 <- ggplot(data = posterior.bayes,aes(x = X3)) + geom_histogram() + ggtitle("Histogram X3")
>h4 <- ggplot(data = posterior.bayes,aes(x = X4)) + geom_histogram() + ggtitle("Histogram X4")
>h5 <- ggplot(data = posterior.bayes,aes(x = X5)) + geom_histogram() + ggtitle("Histogram X5")
>h7 <- ggplot(data = posterior.bayes,aes(x = X7)) + geom_histogram() + ggtitle("Histogram X7")
>grid.arrange(h1,h2,h3,h4,h5,h7,nrow...

Exercises

Use the multivariate dataset named Auto MPG from the UCI Machine Learning repository (reference 3 in the References section of this chapter). The dataset can be downloaded from the website at https://archive.ics.uci.edu/ml/datasets/Auto+MPG. The dataset describes automobile fuel consumption in miles per gallon (mpg) for cars running in American cities. From the folder containing the datasets, download two files: auto-mpg.data and auto-mpg.names. The auto-mpg.data file contains the data and it is in space-separated format. The auto-mpg.names file has several details about the dataset, including variable names for each column. Build a regression model for the fuel efficiency, as a function displacement (disp), horse power (hp), weight (wt), and acceleration (accel), using both OLS and Bayesian GLM. Predict the values for mpg in the test dataset using both the OLS model and Bayesian GLM model (using the bayesglm function). Find the Root Mean Square Error (RMSE) values for OLS and...

References

Friedman J., Hastie T., and Tibshirani R. The Elements of Statistical Learning – Data Mining, Inference, and Prediction. Springer Series in Statistics. 2009
Tsanas A. and Xifara A. "Accurate Quantitative Estimation of Energy Performance of Residential Buildings Using Statistical Machine Learning Tools". Energy and Buildings. Vol. 49, pp. 560-567. 2012
Quinlan R. "Combining Instance-based and Model-based Learning". In: Tenth International Conference of Machine Learning. 236-243. University of Massachusetts, Amherst. Morgan Kaufmann. 1993. Original dataset is from StatLib library maintained by Carnegie Mellon University.

Summary

In this chapter, we illustrated how Bayesian regression is more useful for prediction with a tighter confidence interval using the Energy efficiency dataset and the bayesglm function of the arm package. We also learned how to simulate the posterior distribution using the sim function in the same R package. In the next chapter, we will learn about Bayesian classification.

The rest of the chapter is locked

You have been reading a chapter from

Learning Bayesian Models with R

Published in: Oct 2015Publisher: PacktISBN-13: 9781783987603

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages