Data Analysis with R - Second Edition

Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use.
Preview in Mapt

Data Analysis with R - Second Edition

Tony Fischetti
New Release!

Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use.
Mapt Subscription
FREE
$29.99/m after trial
eBook
$22.40
RRP $31.99
Save 29%
Print + eBook
$39.99
RRP $39.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$22.40
$39.99
$29.99 p/m after trial
RRP $31.99
RRP $39.99
Subscription
eBook
Print + eBook
Start 14 Day Trial

Frequently bought together


Data Analysis with R - Second Edition Book Cover
Data Analysis with R - Second Edition
$ 31.99
$ 22.40
R Data Analysis Projects Book Cover
R Data Analysis Projects
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $36.98
Add to Cart

Book Details

ISBN 139781788393720
Paperback570 pages

Book Description

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly.

Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.

Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility.

This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Exercises
Summary
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Exercises
Summary
Chapter 5: Using Data To Reason About The World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Exercises
Summary
Chapter 6: Testing Hypotheses
The null hypothesis significance testing framework
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
Chapter 8: The Bootstrap
What's... uhhh... the deal with the bootstrap?
Performing the bootstrap in R (more elegantly)
Confidence intervals
A one-sample test of means
Bootstrapping statistics other than the mean
Busting bootstrap myths
Exercises
Summary
Chapter 9: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
Chapter 10: Predicting Categorical Variables
k-Nearest neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Exercises
Summary
Chapter 11: Predicting Changes with Time
What is a time series?
What is forecasting?
Creating and plotting time series
Components of time series
Time series decomposition
White noise
Autocorrelation
Smoothing
ETS and the state space model
Interventions for improvement
What we didn't cover
Citations for the climate change data
Exercises
Summary
Chapter 12: Sources of Data
Relational databases
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Chapter 13: Dealing with Missing Data
Analysis with missing data
Visualizing missing data
Types of missing data
Unsophisticated methods for dealing with missing data
So how does mice come up with the imputed values?
Exercises
Summary
Chapter 14: Dealing with Messy Data
Checking unsanitized data
Regular expressions
Other tools for messy data
Exercises
Summary
Chapter 15: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Using parallelization
Using Rcpp
Being smarter about your code
Exercises
Summary
Chapter 16: Working with Popular R Packages
The data.table package
Using dplyr and tidyr to manipulate data
Functional programming as a main tidyverse principle
Reshaping data with tidyr
Exercises
Summary
Chapter 17: Reproducibility and Best Practices
R scripting
R projects
Version control
Communicating results
Exercises
Summary

What You Will Learn

  • Gain a thorough understanding of statistical reasoning and sampling theory
  • Employ hypothesis testing to draw inferences from your data
  • Learn Bayesian methods for estimating parameters
  • Train regression, classification, and time series models
  • Handle missing data gracefully using multiple imputation
  • Identify and manage problematic data points
  • Learn how to scale your analyses to larger data with Rcpp, data.table, dplyr, and parallelization
  • Put best practices into effect to make your job easier and facilitate reproducibility

Authors

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Exercises
Summary
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Exercises
Summary
Chapter 5: Using Data To Reason About The World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Exercises
Summary
Chapter 6: Testing Hypotheses
The null hypothesis significance testing framework
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
Chapter 8: The Bootstrap
What's... uhhh... the deal with the bootstrap?
Performing the bootstrap in R (more elegantly)
Confidence intervals
A one-sample test of means
Bootstrapping statistics other than the mean
Busting bootstrap myths
Exercises
Summary
Chapter 9: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
Chapter 10: Predicting Categorical Variables
k-Nearest neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Exercises
Summary
Chapter 11: Predicting Changes with Time
What is a time series?
What is forecasting?
Creating and plotting time series
Components of time series
Time series decomposition
White noise
Autocorrelation
Smoothing
ETS and the state space model
Interventions for improvement
What we didn't cover
Citations for the climate change data
Exercises
Summary
Chapter 12: Sources of Data
Relational databases
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Chapter 13: Dealing with Missing Data
Analysis with missing data
Visualizing missing data
Types of missing data
Unsophisticated methods for dealing with missing data
So how does mice come up with the imputed values?
Exercises
Summary
Chapter 14: Dealing with Messy Data
Checking unsanitized data
Regular expressions
Other tools for messy data
Exercises
Summary
Chapter 15: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Using parallelization
Using Rcpp
Being smarter about your code
Exercises
Summary
Chapter 16: Working with Popular R Packages
The data.table package
Using dplyr and tidyr to manipulate data
Functional programming as a main tidyverse principle
Reshaping data with tidyr
Exercises
Summary
Chapter 17: Reproducibility and Best Practices
R scripting
R projects
Version control
Communicating results
Exercises
Summary

Book Details

ISBN 139781788393720
Paperback570 pages
Read More

Read More Reviews

Recommended for You

R Data Analysis Projects Book Cover
R Data Analysis Projects
$ 39.99
$ 28.00
R Data Visualization Recipes Book Cover
R Data Visualization Recipes
$ 23.99
$ 16.80
Regression Analysis with R Book Cover
Regression Analysis with R
$ 31.99
$ 22.40
Reinforcement Learning with R Book Cover
Reinforcement Learning with R
$ 35.99
$ 25.20
Mastering Machine Learning Algorithms Book Cover
Mastering Machine Learning Algorithms
$ 35.99
$ 25.20
Data Analysis and Exploration with Pandas [Video] Book Cover
Data Analysis and Exploration with Pandas [Video]
$ 124.99
$ 106.25