R for Data Science Cookbook

Over 100 hands-on recipes to effectively solve real-world data problems using the most popular R packages and techniques

R for Data Science Cookbook

Cookbook
Yu-Wei, Chiu (David Chiu)

2 customer reviews
Over 100 hands-on recipes to effectively solve real-world data problems using the most popular R packages and techniques
$35.99
$44.99
RRP $35.99
RRP $44.99
eBook
Print + eBook

Instantly access this course right now and get the skills you need in 2017

With unlimited access to a constantly growing library of over 4,000 eBooks and Videos, a subscription to Mapt gives you everything you need to learn new skills. Cancel anytime.

Preview in Mapt

Book Details

ISBN 139781784390815
Paperback452 pages

Book Description

This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently.

The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration.

In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

Table of Contents

Chapter 1: Functions in R
Introduction
Creating R functions
Matching arguments
Understanding environments
Working with lexical scoping
Understanding closure
Performing lazy evaluation
Creating infix operators
Using the replacement function
Handling errors in a function
The debugging function
Chapter 2: Data Extracting, Transforming, and Loading
Introduction
Downloading open data
Reading and writing CSV files
Scanning text files
Working with Excel files
Reading data from databases
Scraping web data
Accessing Facebook data
Working with twitteR
Chapter 3: Data Preprocessing and Preparation
Introduction
Renaming the data variable
Converting data types
Working with the date format
Adding new records
Filtering data
Dropping data
Merging data
Sorting data
Reshaping data
Detecting missing data
Imputing missing data
Chapter 4: Data Manipulation
Introduction
Enhancing a data.frame with a data.table
Managing data with a data.table
Performing fast aggregation with a data.table
Merging large datasets with a data.table
Subsetting and slicing data with dplyr
Sampling data with dplyr
Selecting columns with dplyr
Chaining operations in dplyr
Arranging rows with dplyr
Eliminating duplicated rows with dplyr
Adding new columns with dplyr
Summarizing data with dplyr
Merging data with dplyr
Chapter 5: Visualizing Data with ggplot2
Introduction
Creating basic plots with ggplot2
Changing aesthetics mapping
Introducing geometric objects
Performing transformations
Adjusting scales
Faceting
Adjusting themes
Combining plots
Creating maps
Chapter 6: Making Interactive Reports
Introduction
Creating R Markdown reports
Learning the markdown syntax
Embedding R code chunks
Creating interactive graphics with ggvis
Understanding basic syntax and grammar
Controlling axes and legends
Using scales
Adding interactivity to a ggvis plot
Creating an R Shiny document
Publishing an R Shiny report
Chapter 7: Simulation from Probability Distributions
Introduction
Generating random samples
Understanding uniform distributions
Generating binomial random variates
Generating Poisson random variates
Sampling from a normal distribution
Sampling from a chi-squared distribution
Understanding Student's t-distribution
Sampling from a dataset
Simulating the stochastic process
Chapter 8: Statistical Inference in R
Introduction
Getting confidence intervals
Performing Z-tests
Performing student's T-tests
Conducting exact binomial tests
Performing Kolmogorov-Smirnov tests
Working with the Pearson's chi-squared tests
Understanding the Wilcoxon Rank Sum and Signed Rank tests
Conducting one-way ANOVA
Performing two-way ANOVA
Chapter 9: Rule and Pattern Mining with R
Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Chapter 10: Time Series Mining with R
Introduction
Creating time series data
Plotting a time series object
Decomposing time series
Smoothing time series
Forecasting time series
Selecting an ARIMA model
Creating an ARIMA model
Forecasting with an ARIMA model
Predicting stock prices with an ARIMA model
Chapter 11: Supervised Machine Learning
Introduction
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Measuring the performance of the regression model
Performing a multiple regression analysis
Selecting the best-fitted regression model with stepwise regression
Applying the Gaussian model for generalized linear regression
Performing a logistic regression analysis
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring model performance with a confusion matrix
Measuring prediction performance using ROCR
Chapter 12: Unsupervised Machine Learning
Introduction
Clustering data with hierarchical clustering
Cutting tree into clusters
Clustering data with the k-means method
Clustering data with the density-based method
Extracting silhouette information from clustering
Comparing clustering methods
Recognizing digits using the density-based clustering method
Grouping similar text documents with k-means clustering methods
Performing dimension reduction with Principal Component Analysis (PCA)
Determining the number of principal components using a scree plot
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using a biplot

What You Will Learn

  • Get to know the functional characteristics of R language
  • Extract, transform, and load data from heterogeneous sources
  • Understand how easily R can confront probability and statistics problems
  • Get simple R instructions to quickly organize and manipulate large datasets
  • Create professional data visualizations and interactive reports
  • Predict user purchase behavior by adopting a classification approach
  • Implement data mining techniques to discover items that are frequently purchased together
  • Group similar text documents by using various clustering methods

Authors

Table of Contents

Chapter 1: Functions in R
Introduction
Creating R functions
Matching arguments
Understanding environments
Working with lexical scoping
Understanding closure
Performing lazy evaluation
Creating infix operators
Using the replacement function
Handling errors in a function
The debugging function
Chapter 2: Data Extracting, Transforming, and Loading
Introduction
Downloading open data
Reading and writing CSV files
Scanning text files
Working with Excel files
Reading data from databases
Scraping web data
Accessing Facebook data
Working with twitteR
Chapter 3: Data Preprocessing and Preparation
Introduction
Renaming the data variable
Converting data types
Working with the date format
Adding new records
Filtering data
Dropping data
Merging data
Sorting data
Reshaping data
Detecting missing data
Imputing missing data
Chapter 4: Data Manipulation
Introduction
Enhancing a data.frame with a data.table
Managing data with a data.table
Performing fast aggregation with a data.table
Merging large datasets with a data.table
Subsetting and slicing data with dplyr
Sampling data with dplyr
Selecting columns with dplyr
Chaining operations in dplyr
Arranging rows with dplyr
Eliminating duplicated rows with dplyr
Adding new columns with dplyr
Summarizing data with dplyr
Merging data with dplyr
Chapter 5: Visualizing Data with ggplot2
Introduction
Creating basic plots with ggplot2
Changing aesthetics mapping
Introducing geometric objects
Performing transformations
Adjusting scales
Faceting
Adjusting themes
Combining plots
Creating maps
Chapter 6: Making Interactive Reports
Introduction
Creating R Markdown reports
Learning the markdown syntax
Embedding R code chunks
Creating interactive graphics with ggvis
Understanding basic syntax and grammar
Controlling axes and legends
Using scales
Adding interactivity to a ggvis plot
Creating an R Shiny document
Publishing an R Shiny report
Chapter 7: Simulation from Probability Distributions
Introduction
Generating random samples
Understanding uniform distributions
Generating binomial random variates
Generating Poisson random variates
Sampling from a normal distribution
Sampling from a chi-squared distribution
Understanding Student's t-distribution
Sampling from a dataset
Simulating the stochastic process
Chapter 8: Statistical Inference in R
Introduction
Getting confidence intervals
Performing Z-tests
Performing student's T-tests
Conducting exact binomial tests
Performing Kolmogorov-Smirnov tests
Working with the Pearson's chi-squared tests
Understanding the Wilcoxon Rank Sum and Signed Rank tests
Conducting one-way ANOVA
Performing two-way ANOVA
Chapter 9: Rule and Pattern Mining with R
Introduction
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Pruning redundant rules
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns with cSPADE
Chapter 10: Time Series Mining with R
Introduction
Creating time series data
Plotting a time series object
Decomposing time series
Smoothing time series
Forecasting time series
Selecting an ARIMA model
Creating an ARIMA model
Forecasting with an ARIMA model
Predicting stock prices with an ARIMA model
Chapter 11: Supervised Machine Learning
Introduction
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Measuring the performance of the regression model
Performing a multiple regression analysis
Selecting the best-fitted regression model with stepwise regression
Applying the Gaussian model for generalized linear regression
Performing a logistic regression analysis
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring model performance with a confusion matrix
Measuring prediction performance using ROCR
Chapter 12: Unsupervised Machine Learning
Introduction
Clustering data with hierarchical clustering
Cutting tree into clusters
Clustering data with the k-means method
Clustering data with the density-based method
Extracting silhouette information from clustering
Comparing clustering methods
Recognizing digits using the density-based clustering method
Grouping similar text documents with k-means clustering methods
Performing dimension reduction with Principal Component Analysis (PCA)
Determining the number of principal components using a scree plot
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using a biplot

Book Details

ISBN 139781784390815
Paperback452 pages
Read More
From 2 reviews

Read More Reviews