Chapter 2: The Shape of Data

Populations, samples, and estimation

Probability distributions

Chapter 3: Describing Relationships

Relationships between a categorical and a continuous variable

Relationships between two categorical variables

The relationship between two continuous variables

Chapter 4: Probability

A tale of two interpretations

Sampling from distributions

Chapter 5: Using Data to Reason About the World

The sampling distribution

Chapter 6: Testing Hypotheses

Null Hypothesis Significance Testing

Testing the mean of one sample

Testing more than two means

Testing independence of proportions

What if my assumptions are unfounded?

Chapter 7: Bayesian Methods

The big idea behind Bayesian analysis

Who cares about coin flips

Fitting distributions the Bayesian way

The Bayesian independent samples t-test

Chapter 8: Predicting Continuous Variables

Simple linear regression with a binary predictor

Regression with a non-binary predictor

The bias-variance trade-off

Linear regression diagnostics

Chapter 9: Predicting Categorical Variables

Chapter 10: Sources of Data

Chapter 11: Dealing with Messy Data

Analysis with missing data

Analysis with unsanitized data

Chapter 12: Dealing with Large Data

Using a bigger and faster machine

Using another R implementation

Be smarter about your code

Chapter 13: Reproducibility and Best Practices

Chapter 14: R Graphics

Base graphics using the default package

Trellis graphs using lattice

Graphs inspired by Grammar of Graphics

Chapter 15: Basic Graph Functions

Creating basic scatter plots

Creating histograms and density plots

Adjusting x and y axes' limits

Creating multiple plot matrix layouts

Adding and formatting legends

Creating graphs with maps

Saving and exporting graphs

Chapter 16: Beyond the Basics – Adjusting Key Parameters

Setting colors of points, lines, and bars

Setting plot background colors

Setting colors for text elements – axis annotations, labels, plot titles, and legends

Choosing color combinations and palettes

Setting fonts for annotations and titles

Choosing plotting point symbol styles and sizes

Choosing line styles and width

Adjusting axis annotations and tick marks

Setting graph margins and dimensions

Chapter 17: Creating Scatter Plots

Grouping data points within a scatter plot

Highlighting grouped data points by size and symbol type

Correlation matrix using pairs plots

Using jitter to distinguish closely packed data points

Adding linear model lines

Adding nonlinear model curves

Adding nonparametric model curves with lowess

Creating three-dimensional scatter plots

Creating Quantile-Quantile plots

Displaying the data density on axes

Creating scatter plots with a smoothed density representation

Chapter 18: Creating Line Graphs and Time Series Charts

Adding customized legends for multiple-line graphs

Using margin labels instead of legends for multiple-line graphs

Adding horizontal and vertical grid lines

Adding marker lines at specific x and y values using abline

Plotting functions of a variable in a dataset

Formatting time series data for plotting

Plotting the date or time variable on the x axis

Annotating axis labels in different human-readable time formats

Adding vertical markers to indicate specific time events

Plotting data with varying time-averaging periods

Chapter 19: Creating Bar, Dot, and Pie Charts

Creating bar charts with more than one factor variable

Creating stacked bar charts

Adjusting the orientation of bars – horizontal and vertical

Adjusting bar widths, spacing, colors, and borders

Displaying values on top of or next to the bars

Placing labels inside bars

Creating bar charts with vertical error bars

Modifying dot charts by grouping variables

Making better, readable pie charts with clockwise-ordered slices

Labeling a pie chart with percentage values for each slice

Adding a legend to a pie chart

Chapter 20: Creating Histograms

Visualizing distributions as count frequencies or probability densities

Setting the bin size and the number of breaks

Adjusting histogram styles – bar colors, borders, and axes

Overlaying a density line over a histogram

Multiple histograms along the diagonal of a pairs plot

Histograms in the margins of line and scatter plots

Chapter 21: Box and Whisker Plots

Creating box plots with narrow boxes for a small number of variables

Varying box widths by the number of observations

Creating box plots with notches

Including or excluding outliers

Creating horizontal box plots

Adjusting the extent of plot whiskers outside the box

Showing the number of observations

Splitting a variable at arbitrary values into subsets

Chapter 22: Creating Heat Maps and Contour Plots

Creating heat maps of a single Z variable with a scale

Creating correlation heat maps

Summarizing multivariate data in a single heat map

Creating filled contour plots

Creating three-dimensional surface plots

Visualizing time series as calendar heat maps

Chapter 23: Creating Maps

Plotting global data by countries on a world map

Creating graphs with regional maps

Plotting data on Google maps

Creating and reading KML data

Working with ESRI shapefiles

Chapter 24: Data Visualization Using Lattice

Creating stacked bar charts

Creating bar charts to visualize cross-tabulation

Creating a conditional histogram

Visualizing distributions through a kernel-density plot

Creating a normal Q-Q plot

Visualizing an empirical Cumulative Distribution Function

Creating a conditional scatter plot

Chapter 25: Data Visualization Using ggplot2

Creating multiple bar charts

Creating a bar chart with error bars

Visualizing the density of a numeric variable

Creating a layered plot with a scatter plot and fitted line

Graph annotation with ggplot

Chapter 26: Inspecting Large Datasets

Multivariate continuous data visualization

Multivariate categorical data visualization

Chapter 27: Three-dimensional Visualizations

Three-dimensional scatter plots

Three-dimensional scatter plots with a regression plane

Three-dimensional bar charts

Three-dimensional density plots

Chapter 28: Finalizing Graphs for Publications and Presentations

Exporting graphs in high-resolution image formats – PNG, JPEG, BMP, and TIFF

Exporting graphs in vector formats – SVG, PDF, and PS

Adding mathematical and scientific notations (typesetting)

Adding text descriptions to graphs

Choosing font families and styles under Windows, Mac OS X, and Linux

Choosing fonts for PostScripts and PDFs

Chapter 29: Warming Up

Data attributes and description

Data transformation and discretization

Chapter 30: Mining Frequent Patterns, Associations, and Correlations

An overview of associations and patterns

Hybrid association rules mining

High-performance algorithms

Chapter 31: Classification

Generic decision tree induction

High-value credit card customers classification using ID3

Web spam detection using C4.5

Web key resource page judgment using CART

Trojan traffic identification method and Bayes classification

Identify spam e-mail and Naïve Bayes classification

Rule-based classification of player types in computer games and rule-based classification

Chapter 32: Advanced Classification

Biological traits and the Bayesian belief network

Protein classification and the k-Nearest Neighbors algorithm

Document retrieval and Support Vector Machine

Classification using frequent patterns

Classification using the backpropagation algorithm

Chapter 33: Cluster Analysis

Search engines and the k-means algorithm

Automatic abstraction of document texts and the k-medoids algorithm

Unsupervised image categorization and affinity propagation clustering

News categorization and hierarchical clustering

Chapter 34: Advanced Cluster Analysis

Customer categorization analysis of e-commerce and DBSCAN

Clustering web pages and OPTICS

Visitor analysis in the browser cache and DENCLUE

Recommendation system and STING

Web sentiment analysis and CLIQUE

Opinion mining and WAVE clustering

User search intent and the EM algorithm

Customer purchase data analysis and clustering high-dimensional data

SNS and clustering graph and network data

Chapter 35: Outlier Detection

Credit card fraud detection and statistical methods

Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods

Intrusion detection and density-based methods

Intrusion detection and clustering-based methods

Monitoring the performance of the web server and classification-based methods

Detecting novelty in text, topic detection, and mining contextual outliers

Collective outliers on spatial data

Outlier detection in high-dimensional data

Chapter 36: Mining Stream, Time-series, and Sequence Data

The credit card transaction flow and STREAM algorithm

Predicting future prices and time-series analysis

Stock market data and time-series clustering and classification

Web click streams and mining symbolic sequences

Mining sequence patterns in transactional databases

Chapter 37: Graph Mining and Network Analysis

Mining frequent subgraph patterns

Chapter 38: Mining Text and Web Data

Text mining and TM packages

The question answering system

Genre categorization of web pages

Categorizing newspaper articles and newswires into topics

Web usage mining with web logs

Chapter 39: Time Series Analysis

Multivariate time series analysis

References and reading list

Chapter 40: Factor Models

Chapter 41: Forecasting Volume

The volume forecasting model

Chapter 42: Big Data – Advanced Analytics

Getting data from open sources

Introduction to big data analysis in R

K-means clustering on big data

Big data linear regression analysis

Chapter 43: FX Derivatives

Terminology and notations

Chapter 44: Interest Rate Derivatives and Models

The Cox-Ingersoll-Ross model

Parameter estimation of interest rate models

Chapter 45: Exotic Options

A general pricing approach

The role of dynamic hedging

Greeks – the link back to the vanilla world

Pricing the Double-no-touch option

Another way to price the Double-no-touch option

The life of a Double-no-touch option – a simulation

Exotic options embedded in structured products

Chapter 46: Optimal Hedging

Hedging in the presence of transaction costs

Chapter 47: Fundamental Analysis

The basics of fundamental analysis

Including multiple variables

Separating investment targets

Setting classification rules

Industry-specific investment

Chapter 48: Technical Analysis, Neural Networks, and Logoptimal Portfolios

Chapter 49: Asset and Liability Management

Interest rate risk measurement

Liquidity risk measurement

Modeling non-maturity deposits

Chapter 50: Capital Adequacy

Principles of the Basel Accords

Chapter 51: Systemic Risks

Systemic risk in a nutshell

The dataset used in our examples

Core-periphery decomposition

Possible interpretations and suggestions

Chapter 52: Introducing Machine Learning

The origins of machine learning

Uses and abuses of machine learning

Machine learning in practice

Chapter 53: Managing and Understanding Data

Exploring and understanding data

Chapter 54: Lazy Learning – Classification Using Nearest Neighbors

Understanding nearest neighbor classification

Example – diagnosing breast cancer with the k-NN algorithm

Chapter 55: Probabilistic Learning – Classification Using Naive Bayes

Understanding Naive Bayes

Example – filtering mobile phone spam with the Naive Bayes algorithm

Chapter 56: Divide and Conquer – Classification Using Decision Trees and Rules

Understanding decision trees

Example – identifying risky bank loans using C5.0 decision trees

Understanding classification rules

Example – identifying poisonous mushrooms with rule learners

Chapter 57: Forecasting Numeric Data – Regression Methods

Example – predicting medical expenses using linear regression

Understanding regression trees and model trees

Example – estimating the quality of wines with regression trees and model trees

Chapter 58: Black Box Methods – Neural Networks and Support Vector Machines

Understanding neural networks

Example – Modeling the strength of concrete with ANNs

Understanding Support Vector Machines

Example – performing OCR with SVMs

Chapter 59: Finding Patterns – Market Basket Analysis Using Association Rules

Understanding association rules

Example – identifying frequently purchased groceries with association rules

Chapter 60: Finding Groups of Data – Clustering with k-means

Example – finding teen market segments using k-means clustering

Chapter 61: Evaluating Model Performance

Measuring performance for classification

Estimating future performance

Chapter 62: Improving Model Performance

Tuning stock models for better performance

Improving model performance with meta-learning

Chapter 63: Specialized Machine Learning Topics

Working with proprietary files and databases

Working with online data and services

Working with domain-specific data

Improving the performance of R