Data Analysis with R

Load, wrangle, and analyze your data using the world's most powerful statistical programming language

Data Analysis with R

Learning
Tony Fischetti

12 customer reviews
Load, wrangle, and analyze your data using the world's most powerful statistical programming language
$43.99
$54.99
RRP $43.99
RRP $54.99
eBook
Print + eBook

Instantly access this course right now and get the skills you need in 2017

With unlimited access to a constantly growing library of over 4,000 eBooks and Videos, a subscription to Mapt gives you everything you need to learn new skills. Cancel anytime.

Free Sample

Book Details

ISBN 139781785288142
Paperback388 pages

Book Description

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. With over 7,000 user contributed packages, it’s easy to find support for the latest and greatest algorithms and techniques.

Starting with the basics of R and statistical reasoning, Data Analysis with R dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.

Packed with engaging problems and exercises, this book begins with a review of R and its syntax. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with “messy data”, large data, communicating results, and facilitating reproducibility.

This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Exercises
Summary
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Exercises
Summary
Chapter 5: Using Data to Reason About the World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Exercises
Summary
Chapter 6: Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
Chapter 8: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
Chapter 9: Predicting Categorical Variables
k-Nearest Neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Exercises
Summary
Chapter 10: Sources of Data
Relational Databases
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Chapter 11: Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Other messiness
Exercises
Summary
Chapter 12: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Use parallelization
Using Rcpp
Be smarter about your code
Exercises
Summary
Chapter 13: Reproducibility and Best Practices
R Scripting
R projects
Version control
Communicating results
Exercises
Summary

What You Will Learn

  • Navigate the R environment
  • Describe and visualize the behavior of data and relationships between data
  • Gain a thorough understanding of statistical reasoning and sampling
  • Employ hypothesis tests to draw inferences from your data
  • Learn Bayesian methods for estimating parameters
  • Perform regression to predict continuous variables
  • Apply powerful classification methods to predict categorical data
  • Handle missing data gracefully using multiple imputation
  • Identify and manage problematic data points
  • Employ parallelization and Rcpp to scale your analyses to larger data
  • Put best practices into effect to make your job easier and facilitate reproducibility

Authors

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Exercises
Summary
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Exercises
Summary
Chapter 5: Using Data to Reason About the World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Exercises
Summary
Chapter 6: Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
Chapter 8: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
Chapter 9: Predicting Categorical Variables
k-Nearest Neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Exercises
Summary
Chapter 10: Sources of Data
Relational Databases
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Chapter 11: Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Other messiness
Exercises
Summary
Chapter 12: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Use parallelization
Using Rcpp
Be smarter about your code
Exercises
Summary
Chapter 13: Reproducibility and Best Practices
R Scripting
R projects
Version control
Communicating results
Exercises
Summary

Book Details

ISBN 139781785288142
Paperback388 pages
Read More
From 12 reviews

Read More Reviews