Machine Learning with R Cookbook

Explore over 110 recipes to analyze data and build predictive models with the simple and easy-to-use R code

Machine Learning with R Cookbook

This ebook is included in a Mapt subscription
Yu-Wei, Chiu (David Chiu)

2 customer reviews
Explore over 110 recipes to analyze data and build predictive models with the simple and easy-to-use R code
$0.00
$37.99
$46.99
$29.99p/m after trial
RRP $37.99
RRP $46.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 4,000+ eBooks & Videos
  • 40+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783982042
Paperback442 pages

Book Description

The R language is a powerful open source functional programming language. At its core, R is a statistical programming language that provides impressive tools to analyze data and create high-level graphics.

This book covers the basics of R by setting up a user-friendly programming environment and performing data ETL in R. Data exploration examples are provided that demonstrate how powerful data visualization and machine learning is in discovering hidden relationships. You will then dive into important machine learning topics, including data classification, regression, clustering, association rule mining, and dimension reduction.

Table of Contents

Chapter 1: Practical Machine Learning with R
Introduction
Downloading and installing R
Downloading and installing RStudio
Installing and loading packages
Reading and writing data
Using R to manipulate data
Applying basic statistics
Visualizing data
Getting a dataset for machine learning
Chapter 2: Data Exploration with RMS Titanic
Introduction
Reading a Titanic dataset from a CSV file
Converting types on character variables
Detecting missing values
Imputing missing values
Exploring and visualizing data
Predicting passenger survival with a decision tree
Validating the power of prediction with a confusion matrix
Assessing performance with the ROC curve
Chapter 3: R and Statistics
Introduction
Understanding data sampling in R
Operating a probability distribution in R
Working with univariate descriptive statistics in R
Performing correlations and multivariate analysis
Operating linear regression and multivariate analysis
Conducting an exact binomial test
Performing student's t-test
Performing the Kolmogorov-Smirnov test
Understanding the Wilcoxon Rank Sum and Signed Rank test
Working with Pearson's Chi-squared test
Conducting a one-way ANOVA
Performing a two-way ANOVA
Chapter 4: Understanding Regression Analysis
Introduction
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Generating a diagnostic plot of a fitted model
Fitting a polynomial regression model with lm
Fitting a robust linear regression model with rlm
Studying a case of linear regression on SLID data
Applying the Gaussian model for generalized linear regression
Applying the Poisson model for generalized linear regression
Applying the Binomial model for generalized linear regression
Fitting a generalized additive model to data
Visualizing a generalized additive model
Diagnosing a generalized additive model
Chapter 5: Classification (I) – Tree, Lazy, and Probabilistic
Introduction
Preparing the training and testing datasets
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring the prediction performance of a recursive partitioning tree
Pruning a recursive partitioning tree
Building a classification model with a conditional inference tree
Visualizing a conditional inference tree
Measuring the prediction performance of a conditional inference tree
Classifying data with the k-nearest neighbor classifier
Classifying data with logistic regression
Classifying data with the Naïve Bayes classifier
Chapter 6: Classification (II) – Neural Network and SVM
Introduction
Classifying data with a support vector machine
Choosing the cost of a support vector machine
Visualizing an SVM fit
Predicting labels based on a model trained by a support vector machine
Tuning a support vector machine
Training a neural network with neuralnet
Visualizing a neural network trained by neuralnet
Predicting labels based on a model trained by neuralnet
Training a neural network with nnet
Predicting labels based on a model trained by nnet
Chapter 7: Model Evaluation
Introduction
Estimating model performance with k-fold cross-validation
Performing cross-validation with the e1071 package
Performing cross-validation with the caret package
Ranking the variable importance with the caret package
Ranking the variable importance with the rminer package
Finding highly correlated features with the caret package
Selecting features using the caret package
Measuring the performance of the regression model
Measuring prediction performance with a confusion matrix
Measuring prediction performance using ROCR
Comparing an ROC curve using the caret package
Measuring performance differences between models with the caret package
Chapter 8: Ensemble Learning
Introduction
Classifying data with the bagging method
Performing cross-validation with the bagging method
Classifying data with the boosting method
Performing cross-validation with the boosting method
Classifying data with gradient boosting
Calculating the margins of a classifier
Calculating the error evolution of the ensemble method
Classifying data with random forest
Estimating the prediction errors of different classifiers
Chapter 9: Clustering
Introduction
Clustering data with hierarchical clustering
Cutting trees into clusters
Clustering data with the k-means method
Drawing a bivariate cluster plot
Comparing clustering methods
Extracting silhouette information from clustering
Obtaining the optimum number of clusters for k-means
Clustering data with the density-based method
Clustering data with the model-based method
Visualizing a dissimilarity matrix
Validating clusters externally
Chapter 10: Association Analysis and Sequence Mining
Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Chapter 11: Dimension Reduction
Introduction
Performing feature selection with FSelector
Performing dimension reduction with PCA
Determining the number of principal components using the scree test
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using biplot
Performing dimension reduction with MDS
Reducing dimensions with SVD
Compressing images with SVD
Performing nonlinear dimension reduction with ISOMAP
Performing nonlinear dimension reduction with Local Linear Embedding
Chapter 12: Big Data Analysis (R and Hadoop)
Introduction
Preparing the RHadoop environment
Installing rmr2
Installing rhdfs
Operating HDFS with rhdfs
Implementing a word count problem with RHadoop
Comparing the performance between an R MapReduce program and a standard R program
Testing and debugging the rmr2 program
Installing plyrmr
Manipulating data with plyrmr
Conducting machine learning with RHadoop
Configuring RHadoop clusters on Amazon EMR

What You Will Learn

  • Create and inspect the transaction dataset, performing association analysis with the Apriori algorithm
  • Visualize patterns and associations using a range of graphs and find frequent itemsets using the Eclat algorithm
  • Compare differences between each regression method to discover how they solve problems
  • Predict possible churn users with the classification approach
  • Implement the clustering method to segment customer data
  • Compress images with the dimension reduction method
  • Incorporate R and Hadoop to solve machine learning problems on big data

Authors

Table of Contents

Chapter 1: Practical Machine Learning with R
Introduction
Downloading and installing R
Downloading and installing RStudio
Installing and loading packages
Reading and writing data
Using R to manipulate data
Applying basic statistics
Visualizing data
Getting a dataset for machine learning
Chapter 2: Data Exploration with RMS Titanic
Introduction
Reading a Titanic dataset from a CSV file
Converting types on character variables
Detecting missing values
Imputing missing values
Exploring and visualizing data
Predicting passenger survival with a decision tree
Validating the power of prediction with a confusion matrix
Assessing performance with the ROC curve
Chapter 3: R and Statistics
Introduction
Understanding data sampling in R
Operating a probability distribution in R
Working with univariate descriptive statistics in R
Performing correlations and multivariate analysis
Operating linear regression and multivariate analysis
Conducting an exact binomial test
Performing student's t-test
Performing the Kolmogorov-Smirnov test
Understanding the Wilcoxon Rank Sum and Signed Rank test
Working with Pearson's Chi-squared test
Conducting a one-way ANOVA
Performing a two-way ANOVA
Chapter 4: Understanding Regression Analysis
Introduction
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Generating a diagnostic plot of a fitted model
Fitting a polynomial regression model with lm
Fitting a robust linear regression model with rlm
Studying a case of linear regression on SLID data
Applying the Gaussian model for generalized linear regression
Applying the Poisson model for generalized linear regression
Applying the Binomial model for generalized linear regression
Fitting a generalized additive model to data
Visualizing a generalized additive model
Diagnosing a generalized additive model
Chapter 5: Classification (I) – Tree, Lazy, and Probabilistic
Introduction
Preparing the training and testing datasets
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring the prediction performance of a recursive partitioning tree
Pruning a recursive partitioning tree
Building a classification model with a conditional inference tree
Visualizing a conditional inference tree
Measuring the prediction performance of a conditional inference tree
Classifying data with the k-nearest neighbor classifier
Classifying data with logistic regression
Classifying data with the Naïve Bayes classifier
Chapter 6: Classification (II) – Neural Network and SVM
Introduction
Classifying data with a support vector machine
Choosing the cost of a support vector machine
Visualizing an SVM fit
Predicting labels based on a model trained by a support vector machine
Tuning a support vector machine
Training a neural network with neuralnet
Visualizing a neural network trained by neuralnet
Predicting labels based on a model trained by neuralnet
Training a neural network with nnet
Predicting labels based on a model trained by nnet
Chapter 7: Model Evaluation
Introduction
Estimating model performance with k-fold cross-validation
Performing cross-validation with the e1071 package
Performing cross-validation with the caret package
Ranking the variable importance with the caret package
Ranking the variable importance with the rminer package
Finding highly correlated features with the caret package
Selecting features using the caret package
Measuring the performance of the regression model
Measuring prediction performance with a confusion matrix
Measuring prediction performance using ROCR
Comparing an ROC curve using the caret package
Measuring performance differences between models with the caret package
Chapter 8: Ensemble Learning
Introduction
Classifying data with the bagging method
Performing cross-validation with the bagging method
Classifying data with the boosting method
Performing cross-validation with the boosting method
Classifying data with gradient boosting
Calculating the margins of a classifier
Calculating the error evolution of the ensemble method
Classifying data with random forest
Estimating the prediction errors of different classifiers
Chapter 9: Clustering
Introduction
Clustering data with hierarchical clustering
Cutting trees into clusters
Clustering data with the k-means method
Drawing a bivariate cluster plot
Comparing clustering methods
Extracting silhouette information from clustering
Obtaining the optimum number of clusters for k-means
Clustering data with the density-based method
Clustering data with the model-based method
Visualizing a dissimilarity matrix
Validating clusters externally
Chapter 10: Association Analysis and Sequence Mining
Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Chapter 11: Dimension Reduction
Introduction
Performing feature selection with FSelector
Performing dimension reduction with PCA
Determining the number of principal components using the scree test
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using biplot
Performing dimension reduction with MDS
Reducing dimensions with SVD
Compressing images with SVD
Performing nonlinear dimension reduction with ISOMAP
Performing nonlinear dimension reduction with Local Linear Embedding
Chapter 12: Big Data Analysis (R and Hadoop)
Introduction
Preparing the RHadoop environment
Installing rmr2
Installing rhdfs
Operating HDFS with rhdfs
Implementing a word count problem with RHadoop
Comparing the performance between an R MapReduce program and a standard R program
Testing and debugging the rmr2 program
Installing plyrmr
Manipulating data with plyrmr
Conducting machine learning with RHadoop
Configuring RHadoop clusters on Amazon EMR

Book Details

ISBN 139781783982042
Paperback442 pages
Read More
From 2 reviews

Read More Reviews