Machine Learning with R Cookbook - Second Edition

Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code
Preview in Mapt

Machine Learning with R Cookbook - Second Edition

Ashishsingh Bhatia, Yu-Wei, Chiu (David Chiu)

1 customer reviews
Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code
Mapt Subscription
FREE
$29.99/m after trial
eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$28.00
$49.99
$29.99 p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 14 Day Trial

Frequently bought together


Machine Learning with R Cookbook - Second Edition Book Cover
Machine Learning with R Cookbook - Second Edition
$ 39.99
$ 28.00
Mastering Machine Learning with R - Second Edition Book Cover
Mastering Machine Learning with R - Second Edition
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $44.98
Add to Cart

Book Details

ISBN 139781787284395
Paperback572 pages

Book Description

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You'll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.

Table of Contents

Chapter 1: Practical Machine Learning with R
Introduction
Downloading and installing R
Downloading and installing RStudio
Installing and loading packages
Understanding of basic data structures
Basic commands for subsetting
Reading and writing data
Manipulating data
Applying basic statistics
Visualizing data
Getting a dataset for machine learning
Chapter 2: Data Exploration with Air Quality Datasets
Introduction
Using air quality dataset
Converting attributes to factor
Detecting missing values
Imputing missing values
Exploring and visualizing data
Predicting values from datasets
Chapter 3: Analyzing Time Series Data
Introduction
Looking at time series data
Plotting and forecasting time series data
Extracting, subsetting, merging, filling, and padding
Successive differences and moving averages
Exponential smoothing
Plotting the autocorrelation function
Chapter 4: R and Statistics
Introduction
Understanding data sampling in R
Operating a probability distribution in R
Working with univariate descriptive statistics in R
Performing correlations and multivariate analysis
Conducting an exact binomial test
Performing a student's t-test
Performing the Kolmogorov-Smirnov test
Understanding the Wilcoxon Rank Sum and Signed Rank test
Working with Pearson's Chi-squared test
Conducting a one-way ANOVA
Performing a two-way ANOVA
Chapter 5: Understanding Regression Analysis
Introduction
Different types of regression
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Generating a diagnostic plot of a fitted model
Fitting multiple regression
Summarizing multiple regression
Using multiple regression to predict unknown values
Fitting a polynomial regression model with lm
Fitting a robust linear regression model with rlm
Studying a case of linear regression on SLID data
Applying the Gaussian model for generalized linear regression
Applying the Poisson model for generalized linear regression
Applying the Binomial model for generalized linear regression
Fitting a generalized additive model to data
Visualizing a generalized additive model
Diagnosing a generalized additive model
Chapter 6: Survival Analysis
Introduction
Loading and observing data
Viewing the summary of survival analysis
Visualizing the Survival Curve
Using the log-rank test
Using the COX proportional hazard model
Nelson-Aalen Estimator of cumulative hazard
Chapter 7: Classification 1 - Tree, Lazy, and Probabilistic
Introduction
Preparing the training and testing datasets
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring the prediction performance of a recursive partitioning tree
Pruning a recursive partitioning tree
Handling missing data and split and surrogate variables
Building a classification model with a conditional inference tree
Control parameters in conditional inference trees
Visualizing a conditional inference tree
Measuring the prediction performance of a conditional inference tree
Classifying data with the k-nearest neighbor classifier
Classifying data with logistic regression
Classifying data with the Naïve Bayes classifier
Chapter 8: Classification 2 - Neural Network and SVM
Introduction
Classifying data with a support vector machine
Choosing the cost of a support vector machine
Visualizing an SVM fit
Predicting labels based on a model trained by a support vector machine
Tuning a support vector machine
The basics of neural network
Training a neural network with neuralnet
Visualizing a neural network trained by neuralnet
Predicting labels based on a model trained by neuralnet
Training a neural network with nnet
Predicting labels based on a model trained by nnet
Chapter 9: Model Evaluation
Introduction
Estimating model performance with k-fold cross-validation
Estimating model performance with Leave One Out Cross Validation
Performing cross-validation with the e1071 package
Performing cross-validation with the caret package
Ranking the variable importance with the caret package
Ranking the variable importance with the rminer package
Finding highly correlated features with the caret package
Selecting features using the caret package
Measuring the performance of the regression model
Measuring prediction performance with a confusion matrix
Measuring prediction performance using ROCR
Comparing an ROC curve using the caret package
Measuring performance differences between models with the caret package
Chapter 10: Ensemble Learning
Introduction
Using the Super Learner algorithm
Using ensemble to train and test
Classifying data with the bagging method
Performing cross-validation with the bagging method
Classifying data with the boosting method
Performing cross-validation with the boosting method
Classifying data with gradient boosting
Calculating the margins of a classifier
Calculating the error evolution of the ensemble method
Classifying data with random forest
Estimating the prediction errors of different classifiers
Chapter 11: Clustering
Introduction
Clustering data with hierarchical clustering
Cutting trees into clusters
Clustering data with the k-means method
Drawing a bivariate cluster plot
Comparing clustering methods
Extracting silhouette information from clustering
Obtaining the optimum number of clusters for k-means
Clustering data with the density-based method
Clustering data with the model-based method
Visualizing a dissimilarity matrix
Validating clusters externally
Chapter 12: Association Analysis and Sequence Mining
Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Using the TraMineR package for sequence analysis
Visualizing sequence, Chronogram, and Traversal Statistics
Chapter 13: Dimension Reduction
Introduction
Why to reduce the dimension?
Performing feature selection with FSelector
Performing dimension reduction with PCA
Determining the number of principal components using the scree test
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using biplot
Performing dimension reduction with MDS
Reducing dimensions with SVD
Compressing images with SVD
Performing nonlinear dimension reduction with ISOMAP
Performing nonlinear dimension reduction with Local Linear Embedding
Chapter 14: Big Data Analysis (R and Hadoop)
Introduction
Preparing the RHadoop environment
Installing rmr2
Installing rhdfs
Operating HDFS with rhdfs
Implementing a word count problem with RHadoop
Comparing the performance between an R MapReduce program and a standard R program
Testing and debugging the rmr2 program
Installing plyrmr
Manipulating data with plyrmr
Conducting machine learning with RHadoop
Configuring RHadoop clusters on Amazon EMR

What You Will Learn

  • Create and inspect transaction datasets and perform association analysis with the Apriori algorithm
  • Visualize patterns and associations using a range of graphs and find frequent item-sets using the Eclat algorithm
  • Compare differences between each regression method to discover how they solve problems
  • Detect and impute missing values in air quality data
  • Predict possible churn users with the classification approach
  • Plot the autocorrelation function with time series analysis
  • Use the Cox proportional hazards model for survival analysis
  • Implement the clustering method to segment customer data
  • Compress images with the dimension reduction method
  • Incorporate R and Hadoop to solve machine learning problems on big data

Authors

Table of Contents

Chapter 1: Practical Machine Learning with R
Introduction
Downloading and installing R
Downloading and installing RStudio
Installing and loading packages
Understanding of basic data structures
Basic commands for subsetting
Reading and writing data
Manipulating data
Applying basic statistics
Visualizing data
Getting a dataset for machine learning
Chapter 2: Data Exploration with Air Quality Datasets
Introduction
Using air quality dataset
Converting attributes to factor
Detecting missing values
Imputing missing values
Exploring and visualizing data
Predicting values from datasets
Chapter 3: Analyzing Time Series Data
Introduction
Looking at time series data
Plotting and forecasting time series data
Extracting, subsetting, merging, filling, and padding
Successive differences and moving averages
Exponential smoothing
Plotting the autocorrelation function
Chapter 4: R and Statistics
Introduction
Understanding data sampling in R
Operating a probability distribution in R
Working with univariate descriptive statistics in R
Performing correlations and multivariate analysis
Conducting an exact binomial test
Performing a student's t-test
Performing the Kolmogorov-Smirnov test
Understanding the Wilcoxon Rank Sum and Signed Rank test
Working with Pearson's Chi-squared test
Conducting a one-way ANOVA
Performing a two-way ANOVA
Chapter 5: Understanding Regression Analysis
Introduction
Different types of regression
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Generating a diagnostic plot of a fitted model
Fitting multiple regression
Summarizing multiple regression
Using multiple regression to predict unknown values
Fitting a polynomial regression model with lm
Fitting a robust linear regression model with rlm
Studying a case of linear regression on SLID data
Applying the Gaussian model for generalized linear regression
Applying the Poisson model for generalized linear regression
Applying the Binomial model for generalized linear regression
Fitting a generalized additive model to data
Visualizing a generalized additive model
Diagnosing a generalized additive model
Chapter 6: Survival Analysis
Introduction
Loading and observing data
Viewing the summary of survival analysis
Visualizing the Survival Curve
Using the log-rank test
Using the COX proportional hazard model
Nelson-Aalen Estimator of cumulative hazard
Chapter 7: Classification 1 - Tree, Lazy, and Probabilistic
Introduction
Preparing the training and testing datasets
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring the prediction performance of a recursive partitioning tree
Pruning a recursive partitioning tree
Handling missing data and split and surrogate variables
Building a classification model with a conditional inference tree
Control parameters in conditional inference trees
Visualizing a conditional inference tree
Measuring the prediction performance of a conditional inference tree
Classifying data with the k-nearest neighbor classifier
Classifying data with logistic regression
Classifying data with the Naïve Bayes classifier
Chapter 8: Classification 2 - Neural Network and SVM
Introduction
Classifying data with a support vector machine
Choosing the cost of a support vector machine
Visualizing an SVM fit
Predicting labels based on a model trained by a support vector machine
Tuning a support vector machine
The basics of neural network
Training a neural network with neuralnet
Visualizing a neural network trained by neuralnet
Predicting labels based on a model trained by neuralnet
Training a neural network with nnet
Predicting labels based on a model trained by nnet
Chapter 9: Model Evaluation
Introduction
Estimating model performance with k-fold cross-validation
Estimating model performance with Leave One Out Cross Validation
Performing cross-validation with the e1071 package
Performing cross-validation with the caret package
Ranking the variable importance with the caret package
Ranking the variable importance with the rminer package
Finding highly correlated features with the caret package
Selecting features using the caret package
Measuring the performance of the regression model
Measuring prediction performance with a confusion matrix
Measuring prediction performance using ROCR
Comparing an ROC curve using the caret package
Measuring performance differences between models with the caret package
Chapter 10: Ensemble Learning
Introduction
Using the Super Learner algorithm
Using ensemble to train and test
Classifying data with the bagging method
Performing cross-validation with the bagging method
Classifying data with the boosting method
Performing cross-validation with the boosting method
Classifying data with gradient boosting
Calculating the margins of a classifier
Calculating the error evolution of the ensemble method
Classifying data with random forest
Estimating the prediction errors of different classifiers
Chapter 11: Clustering
Introduction
Clustering data with hierarchical clustering
Cutting trees into clusters
Clustering data with the k-means method
Drawing a bivariate cluster plot
Comparing clustering methods
Extracting silhouette information from clustering
Obtaining the optimum number of clusters for k-means
Clustering data with the density-based method
Clustering data with the model-based method
Visualizing a dissimilarity matrix
Validating clusters externally
Chapter 12: Association Analysis and Sequence Mining
Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Using the TraMineR package for sequence analysis
Visualizing sequence, Chronogram, and Traversal Statistics
Chapter 13: Dimension Reduction
Introduction
Why to reduce the dimension?
Performing feature selection with FSelector
Performing dimension reduction with PCA
Determining the number of principal components using the scree test
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using biplot
Performing dimension reduction with MDS
Reducing dimensions with SVD
Compressing images with SVD
Performing nonlinear dimension reduction with ISOMAP
Performing nonlinear dimension reduction with Local Linear Embedding
Chapter 14: Big Data Analysis (R and Hadoop)
Introduction
Preparing the RHadoop environment
Installing rmr2
Installing rhdfs
Operating HDFS with rhdfs
Implementing a word count problem with RHadoop
Comparing the performance between an R MapReduce program and a standard R program
Testing and debugging the rmr2 program
Installing plyrmr
Manipulating data with plyrmr
Conducting machine learning with RHadoop
Configuring RHadoop clusters on Amazon EMR

Book Details

ISBN 139781787284395
Paperback572 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Mastering Machine Learning with R - Second Edition Book Cover
Mastering Machine Learning with R - Second Edition
$ 39.99
$ 28.00
Neural Networks with R Book Cover
Neural Networks with R
$ 31.99
$ 22.40
Practical Time Series Analysis Book Cover
Practical Time Series Analysis
$ 35.99
$ 25.20
Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 28.00
Statistical Application Development with R and Python - Second Edition Book Cover
Statistical Application Development with R and Python - Second Edition
$ 39.99
$ 28.00
Mastering Machine Learning with scikit-learn - Second Edition Book Cover
Mastering Machine Learning with scikit-learn - Second Edition
$ 35.99
$ 25.20