Practical Predictive Analytics

Make sense of your data and predict the unpredictable

Practical Predictive Analytics

Ralph Winters

1 customer reviews
Make sense of your data and predict the unpredictable
Mapt Subscription
FREE
$29.99/m after trial
eBook
$20.00
RRP $39.99
Save 49%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$20.00
$49.99
$29.99p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781785886188
Paperback576 pages

Book Description

This is the go-to book for anyone interested in the steps needed to develop predictive analytics solutions with examples from the world of marketing, healthcare, and retail. We'll get started with a brief history of predictive analytics and learn about different roles and functions people play within a predictive analytics project. Then, we will learn about various ways of installing R along with their pros and cons, combined with a step-by-step installation of RStudio, and a description of the best practices for organizing your projects.

On completing the installation, we will begin to acquire the skills necessary to input, clean, and prepare your data for modeling. We will learn the six specific steps needed to implement and successfully deploy a predictive model starting from asking the right questions through model development and ending with deploying your predictive model into production. We will learn why collaboration is important and how agile iterative modeling cycles can increase your chances of developing and deploying the best successful model.

We will continue your journey in the cloud by extending your skill set by learning about Databricks and SparkR, which allow you to develop predictive models on vast gigabytes of data.

Table of Contents

Chapter 1: Getting Started with Predictive Analytics
Predictive analytics are in so many industries
Skills and roles that are important in Predictive Analytics
Predictive analytics software
Other helpful tools
R
How is a predictive analytics project organized?
GUIs
Getting started with RStudio
The R console
The source window
Our first predictive model
Your second script
R packages
References
Summary
Chapter 2: The Modeling Process
Advantages of a structured approach
Analytic process methodologies
An analytics methodology outline specific steps
Step 2 data understanding
Step 3 data preparation
Step 4 modeling
Step 5 evaluation
Step 6 deployment
References
Summary
Chapter 3: Inputting and Exploring Data
Data input
Joining data
Exploring the hospital dataset
Transposing a dataframe
Missing values
Imputing categorical variables
Outliers
Data transformations
Variable reduction/variable importance
References
Summary
Chapter 4: Introduction to Regression Algorithms
Supervised versus unsupervised learning models
Regression techniques
Generalized linear models
Logistic regression
Summary
Chapter 5: Introduction to Decision Trees, Clustering, and SVM
Decision tree algorithms
Cluster analysis
Support vector machines
References
Summary
Chapter 6: Using Survival Analysis to Predict and Analyze Customer Churn
What is survival analysis?
Our customer satisfaction dataset
Partitioning into training and test data
Setting the stage by creating survival objects
Examining survival curves
Cox regression modeling
Time-based variables
Comparing the models
Variable selection
Summary
Chapter 7: Using Market Basket Analysis as a Recommender Engine
What is market basket analysis?
Examining the groceries transaction file
The sample market basket
Association rule algorithms
Antecedents and descendants
Evaluating the accuracy of a rule
Preparing the raw data file for analysis
Analyzing the input file
Scrubbing and cleaning the data
Removing colors automatically
Filtering out single item transactions
Merging the results back into the original data
Compressing descriptions using camelcase
Creating the test and training datasets
Creating the market basket transaction file
Method two Creating a physical transactions file
Converting to a document term matrix
K-means clustering of terms
Predicting cluster assignments
Running the apriori algorithm on the clusters
Summarizing the metrics
References
Summary
Chapter 8: Exploring Health Care Enrollment Data as a Time Series
Time series data
Health insurance coverage dataset
Housekeeping
Read the data in
Subsetting the columns
Description of the data
Target time series variable
Saving the data
Determining all of the subset groups
Merging the aggregate data back into the original data
Checking the time intervals
Picking out the top groups in terms of average population size
Plotting the data using lattice
Plotting the data using ggplot
Sending output to an external file
Examining the output
Detecting linear trends
Automating the regressions
Ranking the coefficients
Merging scores back into the original dataframe
Plotting the data with the trend lines
Plotting all the categories on one graph
Performing some automated forecasting using the ets function
Smoothing the data using moving averages
Simple moving average
Verifying the SMA calculation
Exponential moving average
Using the ets function
Forecasting using ALL AGES
Plotting the predicted and actual values
The forecast (fit) method
Plotting future values with confidence bands
Modifying the model to include a trend component
Running the ets function iteratively over all of the categories
Accuracy measures produced by onestep
Comparing the Test and Training for the "UNDER 18 YEARS" group
Accuracy measures
References
Summary
Chapter 9: Introduction to Spark Using R
About Spark
Spark environments
SparkR
Building our first Spark dataframe
Importing the sample notebook
Creating a new notebook
Becoming large by starting small
Running the code
Running the initialization code
Extracting the Pima Indians diabetes dataset
Simulating the data
Simulating the negative cases
Running summary statistics
Saving your work
Summary
Chapter 10: Exploring Large Datasets Using Spark
Performing some exploratory analysis on positives
Cleaning up and caching the table in memory
Some useful Spark functions to explore your data
Creating new columns
Constructing a cross-tab
Contrasting histograms
Plotting using ggplot
Spark SQL
Exporting data from Spark back into R
Running local R packages
Some tips for using Spark
Summary
Chapter 11: Spark Machine Learning - Regression and Cluster Models
About this chapter/what you will learn
Splitting the data into train and test datasets
Spark machine learning using logistic regression
Running predictions for the test data
Combining the training and test dataset
Exposing the three tables to SQL
Validating the regression results
Calculating goodness of fit measures
Confusion matrix for test group
Plotting outside of Spark
Creating some global views
Normalizing the data
Characterizing the clusters by their mean values
Summary
Chapter 12: Spark Models – Rule-Based Learning
Loading the stop and frisk dataset
Reading the table
Discovering the important features
Running the OneR model
Another OneR example
Constructing a decision tree using Rpart
Running an alternative model in Python
Indexing the classification features
Summary

What You Will Learn

  • Master the core predictive analytics algorithm which are used today in business
  • Learn to implement the six steps for a successful analytics project
  • Classify the right algorithm for your requirements
  • Use and apply predictive analytics to research problems in healthcare
  • Implement predictive analytics to retain and acquire your customers
  • Use text mining to understand unstructured data
  • Develop models on your own PC or in Spark/Hadoop environments
  • Implement predictive analytics products for customers

Authors

Table of Contents

Chapter 1: Getting Started with Predictive Analytics
Predictive analytics are in so many industries
Skills and roles that are important in Predictive Analytics
Predictive analytics software
Other helpful tools
R
How is a predictive analytics project organized?
GUIs
Getting started with RStudio
The R console
The source window
Our first predictive model
Your second script
R packages
References
Summary
Chapter 2: The Modeling Process
Advantages of a structured approach
Analytic process methodologies
An analytics methodology outline specific steps
Step 2 data understanding
Step 3 data preparation
Step 4 modeling
Step 5 evaluation
Step 6 deployment
References
Summary
Chapter 3: Inputting and Exploring Data
Data input
Joining data
Exploring the hospital dataset
Transposing a dataframe
Missing values
Imputing categorical variables
Outliers
Data transformations
Variable reduction/variable importance
References
Summary
Chapter 4: Introduction to Regression Algorithms
Supervised versus unsupervised learning models
Regression techniques
Generalized linear models
Logistic regression
Summary
Chapter 5: Introduction to Decision Trees, Clustering, and SVM
Decision tree algorithms
Cluster analysis
Support vector machines
References
Summary
Chapter 6: Using Survival Analysis to Predict and Analyze Customer Churn
What is survival analysis?
Our customer satisfaction dataset
Partitioning into training and test data
Setting the stage by creating survival objects
Examining survival curves
Cox regression modeling
Time-based variables
Comparing the models
Variable selection
Summary
Chapter 7: Using Market Basket Analysis as a Recommender Engine
What is market basket analysis?
Examining the groceries transaction file
The sample market basket
Association rule algorithms
Antecedents and descendants
Evaluating the accuracy of a rule
Preparing the raw data file for analysis
Analyzing the input file
Scrubbing and cleaning the data
Removing colors automatically
Filtering out single item transactions
Merging the results back into the original data
Compressing descriptions using camelcase
Creating the test and training datasets
Creating the market basket transaction file
Method two Creating a physical transactions file
Converting to a document term matrix
K-means clustering of terms
Predicting cluster assignments
Running the apriori algorithm on the clusters
Summarizing the metrics
References
Summary
Chapter 8: Exploring Health Care Enrollment Data as a Time Series
Time series data
Health insurance coverage dataset
Housekeeping
Read the data in
Subsetting the columns
Description of the data
Target time series variable
Saving the data
Determining all of the subset groups
Merging the aggregate data back into the original data
Checking the time intervals
Picking out the top groups in terms of average population size
Plotting the data using lattice
Plotting the data using ggplot
Sending output to an external file
Examining the output
Detecting linear trends
Automating the regressions
Ranking the coefficients
Merging scores back into the original dataframe
Plotting the data with the trend lines
Plotting all the categories on one graph
Performing some automated forecasting using the ets function
Smoothing the data using moving averages
Simple moving average
Verifying the SMA calculation
Exponential moving average
Using the ets function
Forecasting using ALL AGES
Plotting the predicted and actual values
The forecast (fit) method
Plotting future values with confidence bands
Modifying the model to include a trend component
Running the ets function iteratively over all of the categories
Accuracy measures produced by onestep
Comparing the Test and Training for the "UNDER 18 YEARS" group
Accuracy measures
References
Summary
Chapter 9: Introduction to Spark Using R
About Spark
Spark environments
SparkR
Building our first Spark dataframe
Importing the sample notebook
Creating a new notebook
Becoming large by starting small
Running the code
Running the initialization code
Extracting the Pima Indians diabetes dataset
Simulating the data
Simulating the negative cases
Running summary statistics
Saving your work
Summary
Chapter 10: Exploring Large Datasets Using Spark
Performing some exploratory analysis on positives
Cleaning up and caching the table in memory
Some useful Spark functions to explore your data
Creating new columns
Constructing a cross-tab
Contrasting histograms
Plotting using ggplot
Spark SQL
Exporting data from Spark back into R
Running local R packages
Some tips for using Spark
Summary
Chapter 11: Spark Machine Learning - Regression and Cluster Models
About this chapter/what you will learn
Splitting the data into train and test datasets
Spark machine learning using logistic regression
Running predictions for the test data
Combining the training and test dataset
Exposing the three tables to SQL
Validating the regression results
Calculating goodness of fit measures
Confusion matrix for test group
Plotting outside of Spark
Creating some global views
Normalizing the data
Characterizing the clusters by their mean values
Summary
Chapter 12: Spark Models – Rule-Based Learning
Loading the stop and frisk dataset
Reading the table
Discovering the important features
Running the OneR model
Another OneR example
Constructing a decision tree using Rpart
Running an alternative model in Python
Indexing the classification features
Summary

Book Details

ISBN 139781785886188
Paperback576 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Practical Machine Learning Book Cover
Practical Machine Learning
$ 37.99
$ 26.60
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Python Machine Learning Blueprints: Intuitive data projects you can relate to Book Cover
Python Machine Learning Blueprints: Intuitive data projects you can relate to
$ 39.99
$ 28.00
Learning Predictive Analytics with Python Book Cover
Learning Predictive Analytics with Python
$ 39.99
$ 28.00
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00