Chapter 2: The Shape of Data

Populations, samples, and estimation

Probability distributions

Chapter 3: Describing Relationships

Relationships between a categorical and a continuous variable

Relationships between two categorical variables

The relationship between two continuous variables

Chapter 4: Probability

A tale of two interpretations

Sampling from distributions

Chapter 5: Using Data to Reason About the World

The sampling distribution

Chapter 6: Testing Hypotheses

Null Hypothesis Significance Testing

Testing the mean of one sample

Testing more than two means

Testing independence of proportions

What if my assumptions are unfounded?

Chapter 7: Bayesian Methods

The big idea behind Bayesian analysis

Who cares about coin flips

Fitting distributions the Bayesian way

The Bayesian independent samples t-test

Chapter 8: Predicting Continuous Variables

Simple linear regression with a binary predictor

Regression with a non-binary predictor

The bias-variance trade-off

Linear regression diagnostics

Chapter 9: Predicting Categorical Variables

Chapter 10: Sources of Data

Chapter 11: Dealing with Messy Data

Analysis with missing data

Analysis with unsanitized data

Chapter 12: Dealing with Large Data

Using a bigger and faster machine

Using another R implementation

Be smarter about your code

Chapter 13: Reproducibility and Best Practices

Chapter 14: Visualizing and Manipulating Data Using R

Application – Outlier detection

Chapter 15: Data Visualization with Lattice

Loading and discovering the lattice package

Discovering multipanel conditioning with xyplot()

Discovering other lattice plots

Case study – exploring cancer-related deaths in the US

Chapter 16: Cluster Analysis

Learning by doing – partition clustering with kmeans()

Using k-means with public datasets

Chapter 17: Agglomerative Clustering Using hclust()

The inner working of agglomerative clustering

Agglomerative clustering with hclust()

Chapter 18: Dimensionality Reduction with Principal Component Analysis

The inner working of Principal Component Analysis

Chapter 19: Exploring Association Rules with Apriori

The inner working of apriori

Analyzing data with apriori in R

Chapter 20: Probability Distributions, Covariance, and Correlation

Probability distributions

Covariance and correlation

Chapter 21: Linear Regression

Understanding simple regression

Working with multiple regression

Analyzing data in R: correlation and regression

Chapter 22: Classification with k-Nearest Neighbors and Naïve Bayes

Understanding Naïve Bayes

Working with Naïve Bayes in R

Computing the performance of classification

Chapter 23: Classification Trees

Understanding decision trees

Classification and regression trees and random forest

Conditional inference trees and forests

Installing the packages containing the required functions

Performing the analyses in R

Caret – a unified framework for classification

Chapter 24: Multilevel Analyses

Predictions using multilevel models

Chapter 25: Text Analytics with R

An introduction to text analytics

Creating the training and testing data frames

Classification of the reviews

Chapter 26: Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML

Cross-validation and bootstrapping of predictive models using the caret package

Exporting models using PMML

Chapter 27: Gearing Up for Predictive Modeling

The process of predictive modeling

Chapter 28: Linear Regression

Introduction to linear regression

Multiple linear regression

Assessing linear regression models

Problems with linear regression

Chapter 29: Logistic Regression

Classifying with linear regression

Introduction to logistic regression

Assessing logistic regression models

Regularization with the lasso

Extensions of the binary logistic classifier

Chapter 30: Neural Networks

Stochastic gradient descent

Multilayer perceptron networks

Predicting the energy efficiency of buildings

Predicting glass type revisited

Predicting handwritten digits

Chapter 31: Support Vector Machines

Maximal margin classification

Support vector classification

Kernels and support vector machines

Predicting chemical biodegration

Multiclass classification with support vector machines

Chapter 32: Tree-based Methods

The intuition for tree models

Algorithms for training decision trees

Predicting class membership on synthetic 2D data

Predicting the authenticity of banknotes

Predicting complex skill learning

Chapter 33: Ensemble Methods

Predicting atmospheric gamma ray radiation

Predicting complex skill learning with boosting

Chapter 34: Probabilistic Graphical Models

The Naïve Bayes classifier

Predicting promoter gene sequences

Predicting letter patterns in English words

Chapter 35: Time Series Analysis

Fundamental concepts of time series

Some fundamental time series

Stationary time series models

Non-stationary time series models

Predicting intense earthquakes

Predicting lynx trappings

Predicting foreign exchange rates

Chapter 36: Topic Modeling

An overview of topic modeling

Latent Dirichlet Allocation

Modeling the topics of online news stories

Chapter 37: Recommendation Systems

Singular value decomposition

Predicting recommendations for movies and jokes

Loading and preprocessing the data

Other approaches to recommendation systems