R: Predictive Analysis

Master the art of predictive modeling
Preview in Mapt

R: Predictive Analysis

Tony Fischetti, Eric Mayor, Rui Miguel Forte

2 customer reviews
Master the art of predictive modeling
Mapt Subscription
FREE
$29.99/m after trial
eBook
$10.00
RRP $71.99
Save 86%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$10.00
$29.99 p/m after trial
RRP $71.99
Subscription
eBook
Start 14 Day Trial

Frequently bought together


R: Predictive Analysis Book Cover
R: Predictive Analysis
$ 71.99
$ 10.00
R: Mining spatial, text, web, and social media data Book Cover
R: Mining spatial, text, web, and social media data
$ 63.99
$ 10.00
Buy 2 for $20.00
Save $115.98
Add to Cart

Book Details

ISBN 139781788290371
Paperback1065 pages

Book Description

Predictive analytics is a field that uses data to build models that predict a future outcome of interest. It can be applied to a range of business strategies and has been a key player in search advertising and recommendation engines.

The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. R offers a free and open source environment that is perfect for both learning and deploying predictive modeling solutions in the real world. This Learning Path will provide you with all the steps you need to master the art of predictive modeling with R.

We start with an introduction to data analysis with R, and then gradually you’ll get your feet wet with predictive modeling. You will get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. You will be able to solve the difficulties relating to performing data analysis in practice and find solutions to working with “messy data”, large data, communicating results, and facilitating reproducibility. You will then perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. By the end of this Learning Path, you will have explored and tested the most popular modeling techniques in use on real-world data sets and mastered a diverse range of techniques in predictive analytics.

This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:

  • Data Analysis with R, Tony Fischetti
  • Learning Predictive Analytics with R, Eric Mayor
  • Mastering Predictive Analytics with R, Rui Miguel Forte

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Exercises
Summary
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Exercises
Summary
Chapter 5: Using Data to Reason About the World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Exercises
Summary
Chapter 6: Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
Chapter 8: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
Chapter 9: Predicting Categorical Variables
k-Nearest Neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Exercises
Summary
Chapter 10: Sources of Data
Relational Databases
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Chapter 11: Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Other messiness
Exercises
Summary
Chapter 12: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Use parallelization
Using Rcpp
Be smarter about your code
Exercises
Summary
Chapter 13: Reproducibility and Best Practices
R Scripting
R projects
Version control
Communicating results
Exercises
Summary
Chapter 14: Visualizing and Manipulating Data Using R
The roulette case
Histograms and bar plots
Scatterplots
Boxplots
Line plots
Application – Outlier detection
Formatting plots
Summary
Chapter 15: Data Visualization with Lattice
Loading and discovering the lattice package
Discovering multipanel conditioning with xyplot()
Discovering other lattice plots
Updating graphics
Case study – exploring cancer-related deaths in the US
Summary
Chapter 16: Cluster Analysis
Distance measures
Learning by doing – partition clustering with kmeans()
Using k-means with public datasets
Summary
Chapter 17: Agglomerative Clustering Using hclust()
The inner working of agglomerative clustering
Agglomerative clustering with hclust()
Summary
Chapter 18: Dimensionality Reduction with Principal Component Analysis
The inner working of Principal Component Analysis
Learning PCA in R
Summary
Chapter 19: Exploring Association Rules with Apriori
Apriori – basic concepts
The inner working of apriori
Analyzing data with apriori in R
Summary
Chapter 20: Probability Distributions, Covariance, and Correlation
Probability distributions
Covariance and correlation
Summary
Chapter 21: Linear Regression
Understanding simple regression
Working with multiple regression
Analyzing data in R: correlation and regression
Robust regression
Bootstrapping
Summary
Chapter 22: Classification with k-Nearest Neighbors and Naïve Bayes
Understanding k-NN
Working with k-NN in R
Understanding Naïve Bayes
Working with Naïve Bayes in R
Computing the performance of classification
Summary
Chapter 23: Classification Trees
Understanding decision trees
ID3
C4.5
C5.0
Classification and regression trees and random forest
Conditional inference trees and forests
Installing the packages containing the required functions
Performing the analyses in R
Caret – a unified framework for classification
Summary
Chapter 24: Multilevel Analyses
Nested data
Multilevel regression
Multilevel modeling in R
Predictions using multilevel models
Summary
Chapter 25: Text Analytics with R
An introduction to text analytics
Loading the corpus
Data preparation
Creating the training and testing data frames
Classification of the reviews
Mining the news with R
Summary
Chapter 26: Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML
Cross-validation and bootstrapping of predictive models using the caret package
Exporting models using PMML
Summary
Chapter 27: Gearing Up for Predictive Modeling
Models
Types of models
The process of predictive modeling
Performance metrics
Summary
Chapter 28: Linear Regression
Introduction to linear regression
Simple linear regression
Multiple linear regression
Assessing linear regression models
Problems with linear regression
Feature selection
Regularization
Summary
Chapter 29: Logistic Regression
Classifying with linear regression
Introduction to logistic regression
Predicting heart disease
Assessing logistic regression models
Regularization with the lasso
Classification metrics
Extensions of the binary logistic classifier
Summary
Chapter 30: Neural Networks
The biological neuron
The artificial neuron
Stochastic gradient descent
Multilayer perceptron networks
Predicting the energy efficiency of buildings
Predicting glass type revisited
Predicting handwritten digits
Summary
Chapter 31: Support Vector Machines
Maximal margin classification
Support vector classification
Kernels and support vector machines
Predicting chemical biodegration
Cross-validation
Predicting credit scores
Multiclass classification with support vector machines
Summary
Chapter 32: Tree-based Methods
The intuition for tree models
Algorithms for training decision trees
Predicting class membership on synthetic 2D data
Predicting the authenticity of banknotes
Predicting complex skill learning
Summary
Chapter 33: Ensemble Methods
Bagging
Boosting
Predicting atmospheric gamma ray radiation
Predicting complex skill learning with boosting
Random forests
Summary
Chapter 34: Probabilistic Graphical Models
A little graph theory
Bayes' Theorem
Conditional independence
Bayesian networks
The Naïve Bayes classifier
Hidden Markov models
Predicting promoter gene sequences
Predicting letter patterns in English words
Summary
Chapter 35: Time Series Analysis
Fundamental concepts of time series
Some fundamental time series
Stationarity
Stationary time series models
Non-stationary time series models
Predicting intense earthquakes
Predicting lynx trappings
Predicting foreign exchange rates
Other time series models
Summary
Chapter 36: Topic Modeling
An overview of topic modeling
Latent Dirichlet Allocation
Modeling the topics of online news stories
Summary
Chapter 37: Recommendation Systems
Rating matrix
Collaborative filtering
Singular value decomposition
R and Big Data
Predicting recommendations for movies and jokes
Loading and preprocessing the data
Exploring the data
Other approaches to recommendation systems
Summary

What You Will Learn

  • Get to know the basics of R’s syntax and major data structures
  • Write functions, load data, and install packages
  • Use different data sources in R and know how to interface with databases, and request and load JSON and XML
  • Identify the challenges and apply your knowledge about data analysis in R to imperfect real-world data
  • Predict the future with reasonably simple algorithms
  • Understand key data visualization and predictive analytic skills using R
  • Understand the language of models and the predictive modeling process

Authors

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Exercises
Summary
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Exercises
Summary
Chapter 5: Using Data to Reason About the World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Exercises
Summary
Chapter 6: Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
Chapter 8: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
Chapter 9: Predicting Categorical Variables
k-Nearest Neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Exercises
Summary
Chapter 10: Sources of Data
Relational Databases
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Chapter 11: Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Other messiness
Exercises
Summary
Chapter 12: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Use parallelization
Using Rcpp
Be smarter about your code
Exercises
Summary
Chapter 13: Reproducibility and Best Practices
R Scripting
R projects
Version control
Communicating results
Exercises
Summary
Chapter 14: Visualizing and Manipulating Data Using R
The roulette case
Histograms and bar plots
Scatterplots
Boxplots
Line plots
Application – Outlier detection
Formatting plots
Summary
Chapter 15: Data Visualization with Lattice
Loading and discovering the lattice package
Discovering multipanel conditioning with xyplot()
Discovering other lattice plots
Updating graphics
Case study – exploring cancer-related deaths in the US
Summary
Chapter 16: Cluster Analysis
Distance measures
Learning by doing – partition clustering with kmeans()
Using k-means with public datasets
Summary
Chapter 17: Agglomerative Clustering Using hclust()
The inner working of agglomerative clustering
Agglomerative clustering with hclust()
Summary
Chapter 18: Dimensionality Reduction with Principal Component Analysis
The inner working of Principal Component Analysis
Learning PCA in R
Summary
Chapter 19: Exploring Association Rules with Apriori
Apriori – basic concepts
The inner working of apriori
Analyzing data with apriori in R
Summary
Chapter 20: Probability Distributions, Covariance, and Correlation
Probability distributions
Covariance and correlation
Summary
Chapter 21: Linear Regression
Understanding simple regression
Working with multiple regression
Analyzing data in R: correlation and regression
Robust regression
Bootstrapping
Summary
Chapter 22: Classification with k-Nearest Neighbors and Naïve Bayes
Understanding k-NN
Working with k-NN in R
Understanding Naïve Bayes
Working with Naïve Bayes in R
Computing the performance of classification
Summary
Chapter 23: Classification Trees
Understanding decision trees
ID3
C4.5
C5.0
Classification and regression trees and random forest
Conditional inference trees and forests
Installing the packages containing the required functions
Performing the analyses in R
Caret – a unified framework for classification
Summary
Chapter 24: Multilevel Analyses
Nested data
Multilevel regression
Multilevel modeling in R
Predictions using multilevel models
Summary
Chapter 25: Text Analytics with R
An introduction to text analytics
Loading the corpus
Data preparation
Creating the training and testing data frames
Classification of the reviews
Mining the news with R
Summary
Chapter 26: Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML
Cross-validation and bootstrapping of predictive models using the caret package
Exporting models using PMML
Summary
Chapter 27: Gearing Up for Predictive Modeling
Models
Types of models
The process of predictive modeling
Performance metrics
Summary
Chapter 28: Linear Regression
Introduction to linear regression
Simple linear regression
Multiple linear regression
Assessing linear regression models
Problems with linear regression
Feature selection
Regularization
Summary
Chapter 29: Logistic Regression
Classifying with linear regression
Introduction to logistic regression
Predicting heart disease
Assessing logistic regression models
Regularization with the lasso
Classification metrics
Extensions of the binary logistic classifier
Summary
Chapter 30: Neural Networks
The biological neuron
The artificial neuron
Stochastic gradient descent
Multilayer perceptron networks
Predicting the energy efficiency of buildings
Predicting glass type revisited
Predicting handwritten digits
Summary
Chapter 31: Support Vector Machines
Maximal margin classification
Support vector classification
Kernels and support vector machines
Predicting chemical biodegration
Cross-validation
Predicting credit scores
Multiclass classification with support vector machines
Summary
Chapter 32: Tree-based Methods
The intuition for tree models
Algorithms for training decision trees
Predicting class membership on synthetic 2D data
Predicting the authenticity of banknotes
Predicting complex skill learning
Summary
Chapter 33: Ensemble Methods
Bagging
Boosting
Predicting atmospheric gamma ray radiation
Predicting complex skill learning with boosting
Random forests
Summary
Chapter 34: Probabilistic Graphical Models
A little graph theory
Bayes' Theorem
Conditional independence
Bayesian networks
The Naïve Bayes classifier
Hidden Markov models
Predicting promoter gene sequences
Predicting letter patterns in English words
Summary
Chapter 35: Time Series Analysis
Fundamental concepts of time series
Some fundamental time series
Stationarity
Stationary time series models
Non-stationary time series models
Predicting intense earthquakes
Predicting lynx trappings
Predicting foreign exchange rates
Other time series models
Summary
Chapter 36: Topic Modeling
An overview of topic modeling
Latent Dirichlet Allocation
Modeling the topics of online news stories
Summary
Chapter 37: Recommendation Systems
Rating matrix
Collaborative filtering
Singular value decomposition
R and Big Data
Predicting recommendations for movies and jokes
Loading and preprocessing the data
Exploring the data
Other approaches to recommendation systems
Summary

Book Details

ISBN 139781788290371
Paperback1065 pages
Read More
From 2 reviews

Read More Reviews

Recommended for You

R: Mining spatial, text, web, and social media data Book Cover
R: Mining spatial, text, web, and social media data
$ 63.99
$ 10.00
Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 10.00
R Data Analysis Projects Book Cover
R Data Analysis Projects
$ 39.99
$ 10.00
Python: End-to-end Data Analysis Book Cover
Python: End-to-end Data Analysis
$ 71.99
$ 10.00
R Data Visualization Recipes Book Cover
R Data Visualization Recipes
$ 23.99
$ 10.00
Neural Networks with R Book Cover
Neural Networks with R
$ 31.99
$ 10.00