R Data Analysis Cookbook

Over 80 recipes to help you breeze through your data analysis projects using R

R Data Analysis Cookbook

Cookbook
Viswa Viswanathan, Shanthi Viswanathan

2 customer reviews
Over 80 recipes to help you breeze through your data analysis projects using R
$35.99
$44.99
RRP $35.99
RRP $44.99
eBook
Print + eBook

Instantly access this course right now and get the skills you need in 2017

With unlimited access to a constantly growing library of over 4,000 eBooks and Videos, a subscription to Mapt gives you everything you need to learn new skills. Cancel anytime.

Free Sample

Book Details

ISBN 139781783989065
Paperback342 pages

Book Description

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data.

This book empowers you by showing you ways to use R to generate professional analysis reports. It provides examples for various important analysis and machine-learning tasks that you can try out with associated and readily available data. The book also teaches you to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.

Table of Contents

Chapter 1: Acquire and Prepare the Ingredients – Your Data
Introduction
Reading data from CSV files
Reading XML data
Reading JSON data
Reading data from fixed-width formatted files
Reading data from R files and R libraries
Removing cases with missing values
Replacing missing values with the mean
Removing duplicate cases
Rescaling a variable to [0,1]
Normalizing or standardizing data in a data frame
Binning numerical data
Creating dummies for categorical variables
Chapter 2: What's in There? – Exploratory Data Analysis
Introduction
Creating standard data summaries
Extracting a subset of a dataset
Splitting a dataset
Creating random data partitions
Generating standard plots such as histograms, boxplots, and scatterplots
Generating multiple plots on a grid
Selecting a graphics device
Creating plots with the lattice package
Creating plots with the ggplot2 package
Creating charts that facilitate comparisons
Creating charts that help visualize a possible causality
Creating multivariate plots
Chapter 3: Where Does It Belong? – Classification
Introduction
Generating error/classification-confusion matrices
Generating ROC charts
Building, plotting, and evaluating – classification trees
Using random forest models for classification
Classifying using Support Vector Machine
Classifying using the Naïve Bayes approach
Classifying using the KNN approach
Using neural networks for classification
Classifying using linear discriminant function analysis
Classifying using logistic regression
Using AdaBoost to combine classification tree models
Chapter 4: Give Me a Number – Regression
Introduction
Computing the root mean squared error
Building KNN models for regression
Performing linear regression
Performing variable selection in linear regression
Building regression trees
Building random forest models for regression
Using neural networks for regression
Performing k-fold cross-validation
Performing leave-one-out-cross-validation to limit overfitting
Chapter 5: Can You Simplify That? – Data Reduction Techniques
Introduction
Performing cluster analysis using K-means clustering
Performing cluster analysis using hierarchical clustering
Reducing dimensionality with principal component analysis
Chapter 6: Lessons from History – Time Series Analysis
Introduction
Creating and examining date objects
Operating on date objects
Performing preliminary analyses on time series data
Using time series objects
Decomposing time series
Filtering time series data
Smoothing and forecasting using the Holt-Winters method
Building an automated ARIMA model
Chapter 7: It's All About Your Connections – Social Network Analysis
Introduction
Downloading social network data using public APIs
Creating adjacency matrices and edge lists
Plotting social network data
Computing important network metrics
Chapter 8: Put Your Best Foot Forward – Document and Present Your Analysis
Introduction
Generating reports of your data analysis with R Markdown and knitr
Creating interactive web applications with shiny
Creating PDF presentations of your analysis with R Presentation
Chapter 9: Work Smarter, Not Harder – Efficient and Elegant R Code
Introduction
Exploiting vectorized operations
Processing entire rows or columns using the apply function
Applying a function to all elements of a collection with lapply and sapply
Applying functions to subsets of a vector
Using the split-apply-combine strategy with plyr
Slicing, dicing, and combining data with data tables
Chapter 10: Where in the World? – Geospatial Analysis
Introduction
Downloading and plotting a Google map of an area
Overlaying data on the downloaded Google map
Importing ESRI shape files into R
Using the sp package to plot geographic data
Getting maps from the maps package
Creating spatial data frames from regular data frames containing spatial and other data
Creating spatial data frames by combining regular data frames with spatial objects
Adding variables to an existing spatial data frame
Chapter 11: Playing Nice – Connecting to Other Systems
Introduction
Using Java objects in R
Using JRI to call R functions from Java
Using Rserve to call R functions from Java
Executing R scripts from Java
Using the xlsx package to connect to Excel
Reading data from relational databases – MySQL
Reading data from NoSQL databases – MongoDB

What You Will Learn

  • Get data into your R environment and prepare it for analysis
  • Perform exploratory data analyses and generate meaningful visualizations of the data
  • Apply several machine-learning techniques for classification and regression
  • Get your hands around large data sets with the help of reduction techniques
  • Extract patterns from time-series data and produce forecasts based on them
  • Learn how to extract actionable information from social network data
  • Implement geospatial analysis
  • Present your analysis convincingly through reports and build an infrastructure to enable others to play with your data

Authors

Table of Contents

Chapter 1: Acquire and Prepare the Ingredients – Your Data
Introduction
Reading data from CSV files
Reading XML data
Reading JSON data
Reading data from fixed-width formatted files
Reading data from R files and R libraries
Removing cases with missing values
Replacing missing values with the mean
Removing duplicate cases
Rescaling a variable to [0,1]
Normalizing or standardizing data in a data frame
Binning numerical data
Creating dummies for categorical variables
Chapter 2: What's in There? – Exploratory Data Analysis
Introduction
Creating standard data summaries
Extracting a subset of a dataset
Splitting a dataset
Creating random data partitions
Generating standard plots such as histograms, boxplots, and scatterplots
Generating multiple plots on a grid
Selecting a graphics device
Creating plots with the lattice package
Creating plots with the ggplot2 package
Creating charts that facilitate comparisons
Creating charts that help visualize a possible causality
Creating multivariate plots
Chapter 3: Where Does It Belong? – Classification
Introduction
Generating error/classification-confusion matrices
Generating ROC charts
Building, plotting, and evaluating – classification trees
Using random forest models for classification
Classifying using Support Vector Machine
Classifying using the Naïve Bayes approach
Classifying using the KNN approach
Using neural networks for classification
Classifying using linear discriminant function analysis
Classifying using logistic regression
Using AdaBoost to combine classification tree models
Chapter 4: Give Me a Number – Regression
Introduction
Computing the root mean squared error
Building KNN models for regression
Performing linear regression
Performing variable selection in linear regression
Building regression trees
Building random forest models for regression
Using neural networks for regression
Performing k-fold cross-validation
Performing leave-one-out-cross-validation to limit overfitting
Chapter 5: Can You Simplify That? – Data Reduction Techniques
Introduction
Performing cluster analysis using K-means clustering
Performing cluster analysis using hierarchical clustering
Reducing dimensionality with principal component analysis
Chapter 6: Lessons from History – Time Series Analysis
Introduction
Creating and examining date objects
Operating on date objects
Performing preliminary analyses on time series data
Using time series objects
Decomposing time series
Filtering time series data
Smoothing and forecasting using the Holt-Winters method
Building an automated ARIMA model
Chapter 7: It's All About Your Connections – Social Network Analysis
Introduction
Downloading social network data using public APIs
Creating adjacency matrices and edge lists
Plotting social network data
Computing important network metrics
Chapter 8: Put Your Best Foot Forward – Document and Present Your Analysis
Introduction
Generating reports of your data analysis with R Markdown and knitr
Creating interactive web applications with shiny
Creating PDF presentations of your analysis with R Presentation
Chapter 9: Work Smarter, Not Harder – Efficient and Elegant R Code
Introduction
Exploiting vectorized operations
Processing entire rows or columns using the apply function
Applying a function to all elements of a collection with lapply and sapply
Applying functions to subsets of a vector
Using the split-apply-combine strategy with plyr
Slicing, dicing, and combining data with data tables
Chapter 10: Where in the World? – Geospatial Analysis
Introduction
Downloading and plotting a Google map of an area
Overlaying data on the downloaded Google map
Importing ESRI shape files into R
Using the sp package to plot geographic data
Getting maps from the maps package
Creating spatial data frames from regular data frames containing spatial and other data
Creating spatial data frames by combining regular data frames with spatial objects
Adding variables to an existing spatial data frame
Chapter 11: Playing Nice – Connecting to Other Systems
Introduction
Using Java objects in R
Using JRI to call R functions from Java
Using Rserve to call R functions from Java
Executing R scripts from Java
Using the xlsx package to connect to Excel
Reading data from relational databases – MySQL
Reading data from NoSQL databases – MongoDB

Book Details

ISBN 139781783989065
Paperback342 pages
Read More
From 2 reviews

Read More Reviews