Packt+ | Advance your knowledge in tech

You're reading from Learning Bayesian Models with R

Product typeBook

Published inOct 2015

Reading LevelBeginner

PublisherPackt

ISBN-139781783987603

Edition1st Edition

Languages

Concepts

Machine Learning

Author (1)

Hari Manassery Koduvely

Chapter 2. The R Environment

R is currently one of the most popular programming environments for statistical computing. It was evolved as an open source language from the S programming language developed at Bell Labs. The main creators of R are two academicians, Robert Gentleman and Ross Ihaka, from the University of Auckland in New Zealand.

The main reasons for the popularity of R, apart from free software under GNU General Public License, are the following:

R is very easy to use. It is an interpreted language and at the same time can be used for procedural programming.
R supports both functional and object-oriented paradigms. It has very strong graphical and data visualization capabilities.
Through its LaTex-like documentation support, R can be used for making high-quality documentation.
Being an open source software, R has a large number of contributed packages that makes almost all statistical modeling possible in this environment.

This chapter is intended to give a basic introduction to R...

Setting up the R environment and packages

R is a free software under GNU open source license. R comes with a basic package and also has a large number of user-contributed packages for advanced analysis and modeling. It also has a nice graphics user interface-based editor called RStudio. In this section, we will learn how to download R, set up the R environment in your computer, and write a simple R program.

Installing R and RStudio

The Comprehensive R Archive Network (CRAN) hosts all releases of R and the contributed packages. R for Windows can be installed by downloading the binary of the base package from http://cran.r-project.org; a standard installation should be sufficient. For Linux and Mac OS X, the webpage gives instructions on how to download and install the software. At the time of writing this book, the latest release was version 3.1.2. Various packages need to be installed separately from the package page. One can install any package from the R command prompt using the following...

Managing data in R

Before we start any serious programming in R, we need to learn how to import data into an R environment and which data types R supports. Often, for a particular analysis, we will not use the entire dataset. Therefore, we need to also learn how to select a subset of the data for any analysis. This section will cover these aspects.

Data Types in R

R has five basic data types as follows:

Integer
Numeric (real)
Complex
Character
Logical (True/False)

The default representation of numbers in R is double precision real number (numeric). If you want an integer representation explicitly, you need to add the suffix L. For example, simply entering 1 on the command prompt will store 1 as a numeric object. To store 1 as an integer, you need to enter 1L. The command class(x) will give the class (type) of the object x. Therefore, entering class(1) on command prompt will give the answer numeric whereas entering class(1L) will give the answer integer.

R also has a special number Inf that represents...

Writing R programs

Although much data analysis in R can be carried out in an interactive manner using command prompt, often for more complex tasks, one needs to write R scripts. As mentioned in the introduction, R has both the perspective of a functional and object-oriented programming language. In this section, some of the standard syntaxes of the programming in R are described.

Control structures

Control structures are meant for controlling the flow of execution of a program. The standard control structures are as follows:

if and else: To test a condition
for: To loop over a set of statements for a fixed number of times
while: To loop over a set of statements while a condition is true
repeat: To execute an infinite loop
break: To break the execution of a loop
next: To skip an iteration of a loop
return: To exit a function

Functions

If one wants to use R for more serious programming, it is essential to know how to write functions. They make the language more powerful and elegant. R has many...

Data visualization

One of the powerful features of R is its functions for generating high-quality plots and visualize data. The graphics functions in R can be divided into three groups:

High-level plotting functions to create new plots, add axes, labels, and titles.
Low-level plotting functions to add more information to an existing plot. This includes adding extra points, lines, and labels.
Interactive graphics functions to interactively add information to, or extract information from, an existing plot.

The R base package itself contains several graphics functions. For more advanced graph applications, one can use packages such as ggplot2, grid, or lattice. In particular, ggplot2 is very useful for generating visually appealing, multilayered graphs. It is based on the concept of grammar of graphics. Due to lack of space, we are not covering these packages in this book. Interested readers should consult the book by Hadley Wickham (reference 4 in the References section of this chapter).

High...

Sampling

Often, we would be interested in creating a representative dataset, for some analysis or design of experiments, by sampling from a population. This is particularly the case for Bayesian inference, as we will see in the later chapters, where samples are drawn from posterior distribution for inference. Therefore, it would be useful to learn how to sample N points from some well-known distributions in this chapter.

Before we use any particular sampling methods, readers should note that R, like any other computer program, uses pseudo random number generators for sampling. It is useful to supply a starting seed number to get reproducible results. This can be done using the set.seed(n) command with an integer n as the seed.

Random uniform sampling from an interval

To generate n random numbers (numeric) that are uniformly distributed in the interval [a, b], one can use the runif() function:

>runif(5,1,10)  #generates 5 random numbers between 1 and 10
[1]  7.416    9.846    3.093   2.656...

Exercises

For the following exercises in this chapter, we use the Auto MPG dataset from the UCI Machine Learning repository (references 5 and 6 in the References section of this chapter). The dataset can be downloaded from https://archive.ics.uci.edu/ml/datasets.html. The dataset contains the fuel consumption of cars in the US measured during 1970-1982. Along with consumption values, there are attribute variables, such as the number of cylinders, displacement, horse power, weight, acceleration, year, origin, and the name of the car:

Load the dataset into R using the read.table() function.
Produce a box plot of mpg values for each car name.
Write a function that will compute the scaled value (subtract the mean and divide by standard deviation) of a column whose name is given as an argument of the function.
Use the lapply() function to compute scaled values for all variables.
Produce a scatter plot of mgp versus acceleration for each car name using coplot(). Use legends to annotate the graph.

References

Matloff N. The Art of R Programming – A Tour of Statistical Software Design. No Starch Press. 2011. ISBN-10: 1593273843
Teetor P. R Cookbook. O'Reilly Media. 2011. ISBN-10: 0596809158
Wickham H. Advanced R. Chapman & Hall/CRC The R Series. 2015. ISBN-10: 1466586966
Wickham H. ggplot2: Elegant Graphics for Data Analysis (Use R!). Springer. 2010. ISBN-10: 0387981403
Auto MPG Data Set, UCI Machine Learning repository, https://archive.ics.uci.edu/ml/datasets/Auto+MPG
Quinlan R. "Combining Instance-Based and Model-Based Learning". In: Tenth International Conference of Machine Learning. 236-243. University of Massachusetts, Amherst. Morgan Kaufmann. 1993

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Summary

In this chapter, you were introduced to the R environment. After reading through this chapter, you learned how to import data into R, make a selection of subsets of data for their analysis, and write simple R programs using functions and control structures. Also, you should now be familiar with the graphical capabilities of R and some advanced capabilities, such as loop functions. In the next chapter, we will begin the central theme of this book, Bayesian inference.

The rest of the chapter is locked

You have been reading a chapter from

Learning Bayesian Models with R

Published in: Oct 2015Publisher: PacktISBN-13: 9781783987603

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages