R: Data Analysis and Visualization

Master the art of building analytical models using R

R: Data Analysis and Visualization

This ebook is included in a Mapt subscription
Tony Fischetti et al.

4 customer reviews
Master the art of building analytical models using R
$10.00
RRP $59.99
eBook
Access every Packt eBook & Video for just $100
 
  • 4,000+ eBooks & Videos
  • 40+ New titles a month
  • 1 Free eBook/Video to keep every month
Find Out More
 
Preview in Mapt

Book Details

ISBN 139781786463500
Paperback1783 pages

Book Description

The R learning path created for you has five connected modules, which are a mini-course in their own right. As you complete each one, you'll have gained key skills and be ready for the material in the next module!

This course begins by looking at the Data Analysis with R module. This will help you navigate the R environment. You'll gain a thorough understanding of statistical reasoning and sampling. Finally, you'll be able to put best practices into effect to make your job easier and facilitate reproducibility.

The second place to explore is R Graphs, which will help you leverage powerful default R graphics and utilize advanced graphics systems such as lattice and ggplot2, the grammar of graphics. You'll learn how to produce, customize, and publish advanced visualizations using this popular and powerful framework.

With the third module, Learning Data Mining with R, you will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs.

The Mastering R for Quantitative Finance module pragmatically introduces both the quantitative finance concepts and their modeling in R, enabling you to build a tailor-made trading system on your own. By the end of the module, you will be well-versed with various financial techniques using R and will be able to place good bets while making financial decisions.

Finally, we'll look at the Machine Learning with R module. With this module, you'll discover all the analytical tools you need to gain insights from complex data and learn how to choose the correct algorithm for your specific needs. You'll also learn to apply machine learning methods to deal with common tasks, including classification, prediction, forecasting, and so on.

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Chapter 5: Using Data to Reason About the World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Chapter 6: Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Chapter 8: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Chapter 9: Predicting Categorical Variables
k-Nearest Neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Chapter 10: Sources of Data
Relational Databases
Using JSON
XML
Other data formats
Online repositories
Chapter 11: Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Other messiness
Chapter 12: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Use parallelization
Using Rcpp
Be smarter about your code
Chapter 13: Reproducibility and Best Practices
R Scripting
R projects
Version control
Communicating results
Chapter 14: R Graphics
Base graphics using the default package
Trellis graphs using lattice
Graphs inspired by Grammar of Graphics
Chapter 15: Basic Graph Functions
Introduction
Creating basic scatter plots
Creating line graphs
Creating bar charts
Creating histograms and density plots
Creating box plots
Adjusting x and y axes' limits
Creating heat maps
Creating pairs plots
Creating multiple plot matrix layouts
Adding and formatting legends
Creating graphs with maps
Saving and exporting graphs
Chapter 16: Beyond the Basics – Adjusting Key Parameters
Introduction
Setting colors of points, lines, and bars
Setting plot background colors
Setting colors for text elements – axis annotations, labels, plot titles, and legends
Choosing color combinations and palettes
Setting fonts for annotations and titles
Choosing plotting point symbol styles and sizes
Choosing line styles and width
Choosing box styles
Adjusting axis annotations and tick marks
Formatting log axes
Setting graph margins and dimensions
Chapter 17: Creating Scatter Plots
Introduction
Grouping data points within a scatter plot
Highlighting grouped data points by size and symbol type
Labeling data points
Correlation matrix using pairs plots
Adding error bars
Using jitter to distinguish closely packed data points
Adding linear model lines
Adding nonlinear model curves
Adding nonparametric model curves with lowess
Creating three-dimensional scatter plots
Creating Quantile-Quantile plots
Displaying the data density on axes
Creating scatter plots with a smoothed density representation
Chapter 18: Creating Line Graphs and Time Series Charts
Introduction
Adding customized legends for multiple-line graphs
Using margin labels instead of legends for multiple-line graphs
Adding horizontal and vertical grid lines
Adding marker lines at specific x and y values using abline
Creating sparklines
Plotting functions of a variable in a dataset
Formatting time series data for plotting
Plotting the date or time variable on the x axis
Annotating axis labels in different human-readable time formats
Adding vertical markers to indicate specific time events
Plotting data with varying time-averaging periods
Creating stock charts
Chapter 19: Creating Bar, Dot, and Pie Charts
Introduction
Creating bar charts with more than one factor variable
Creating stacked bar charts
Adjusting the orientation of bars – horizontal and vertical
Adjusting bar widths, spacing, colors, and borders
Displaying values on top of or next to the bars
Placing labels inside bars
Creating bar charts with vertical error bars
Modifying dot charts by grouping variables
Making better, readable pie charts with clockwise-ordered slices
Labeling a pie chart with percentage values for each slice
Adding a legend to a pie chart
Chapter 20: Creating Histograms
Introduction
Visualizing distributions as count frequencies or probability densities
Setting the bin size and the number of breaks
Adjusting histogram styles – bar colors, borders, and axes
Overlaying a density line over a histogram
Multiple histograms along the diagonal of a pairs plot
Histograms in the margins of line and scatter plots
Chapter 21: Box and Whisker Plots
Introduction
Creating box plots with narrow boxes for a small number of variables
Grouping over a variable
Varying box widths by the number of observations
Creating box plots with notches
Including or excluding outliers
Creating horizontal box plots
Changing the box styling
Adjusting the extent of plot whiskers outside the box
Showing the number of observations
Splitting a variable at arbitrary values into subsets
Chapter 22: Creating Heat Maps and Contour Plots
Introduction
Creating heat maps of a single Z variable with a scale
Creating correlation heat maps
Summarizing multivariate data in a single heat map
Creating contour plots
Creating filled contour plots
Creating three-dimensional surface plots
Visualizing time series as calendar heat maps
Chapter 23: Creating Maps
Introduction
Plotting global data by countries on a world map
Creating graphs with regional maps
Plotting data on Google maps
Creating and reading KML data
Working with ESRI shapefiles
Chapter 24: Data Visualization Using Lattice
Introduction
Creating bar charts
Creating stacked bar charts
Creating bar charts to visualize cross-tabulation
Creating a conditional histogram
Visualizing distributions through a kernel-density plot
Creating a normal Q-Q plot
Visualizing an empirical Cumulative Distribution Function
Creating a boxplot
Creating a conditional scatter plot
Chapter 25: Data Visualization Using ggplot2
Introduction
Creating bar charts
Creating multiple bar charts
Creating a bar chart with error bars
Visualizing the density of a numeric variable
Creating a box plot
Creating a layered plot with a scatter plot and fitted line
Creating a line chart
Graph annotation with ggplot
Chapter 26: Inspecting Large Datasets
Introduction
Multivariate continuous data visualization
Multivariate categorical data visualization
Visualizing mixed data
Zooming and filtering
Chapter 27: Three-dimensional Visualizations
Introduction
Three-dimensional scatter plots
Three-dimensional scatter plots with a regression plane
Three-dimensional bar charts
Three-dimensional density plots
Chapter 28: Finalizing Graphs for Publications and Presentations
Introduction
Exporting graphs in high-resolution image formats – PNG, JPEG, BMP, and TIFF
Exporting graphs in vector formats – SVG, PDF, and PS
Adding mathematical and scientific notations (typesetting)
Adding text descriptions to graphs
Using graph templates
Choosing font families and styles under Windows, Mac OS X, and Linux
Choosing fonts for PostScripts and PDFs
Chapter 29: Warming Up
Big data
Data source
Data mining
Social network mining
Text mining
Web data mining
Why R?
Statistics
Machine learning
Data attributes and description
Data cleaning
Data integration
Data dimension reduction
Data transformation and discretization
Visualization of results
Chapter 30: Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Market basket analysis
Hybrid association rules mining
Mining sequence dataset
The R implementation
High-performance algorithms
Chapter 31: Classification
Classification
Generic decision tree induction
High-value credit card customers classification using ID3
Web spam detection using C4.5
Web key resource page judgment using CART
Trojan traffic identification method and Bayes classification
Identify spam e-mail and Naïve Bayes classification
Rule-based classification of player types in computer games and rule-based classification
Chapter 32: Advanced Classification
Ensemble (EM) methods
Biological traits and the Bayesian belief network
Protein classification and the k-Nearest Neighbors algorithm
Document retrieval and Support Vector Machine
Classification using frequent patterns
Classification using the backpropagation algorithm
Chapter 33: Cluster Analysis
Search engines and the k-means algorithm
Automatic abstraction of document texts and the k-medoids algorithm
The CLARA algorithm
CLARANS
Unsupervised image categorization and affinity propagation clustering
News categorization and hierarchical clustering
Chapter 34: Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE clustering
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data
Chapter 35: Outlier Detection
Credit card fraud detection and statistical methods
Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
Intrusion detection and density-based methods
Intrusion detection and clustering-based methods
Monitoring the performance of the web server and classification-based methods
Detecting novelty in text, topic detection, and mining contextual outliers
Collective outliers on spatial data
Outlier detection in high-dimensional data
Chapter 36: Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
Predicting future prices and time-series analysis
Stock market data and time-series clustering and classification
Web click streams and mining symbolic sequences
Mining sequence patterns in transactional databases
Chapter 37: Graph Mining and Network Analysis
Graph mining
Mining frequent subgraph patterns
Social network mining
Chapter 38: Mining Text and Web Data
Text mining and TM packages
Text summarization
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
Web usage mining with web logs
Chapter 39: Time Series Analysis
Multivariate time series analysis
Volatility modeling
References and reading list
Chapter 40: Factor Models
Arbitrage pricing theory
Modeling in R
References
Chapter 41: Forecasting Volume
Motivation
The intensity of trading
The volume forecasting model
Implementation in R
Chapter 42: Big Data – Advanced Analytics
Getting data from open sources
Introduction to big data analysis in R
K-means clustering on big data
Big data linear regression analysis
References
Chapter 43: FX Derivatives
Terminology and notations
Currency options
Exchange options
Quanto options
References
Chapter 44: Interest Rate Derivatives and Models
The Black model
The Vasicek model
The Cox-Ingersoll-Ross model
Parameter estimation of interest rate models
Using the SMFI5 package
References
Chapter 45: Exotic Options
A general pricing approach
The role of dynamic hedging
How R can help a lot
A glance beyond vanillas
Greeks – the link back to the vanilla world
Pricing the Double-no-touch option
Another way to price the Double-no-touch option
The life of a Double-no-touch option – a simulation
Exotic options embedded in structured products
References
Chapter 46: Optimal Hedging
Hedging of derivatives
Hedging in the presence of transaction costs
Further extensions
References
Chapter 47: Fundamental Analysis
The basics of fundamental analysis
Collecting data
Revealing connections
Including multiple variables
Separating investment targets
Setting classification rules
Backtesting
Industry-specific investment
References
Chapter 48: Technical Analysis, Neural Networks, and Logoptimal Portfolios
Market efficiency
Technical analysis
Neural networks
Logoptimal portfolios
References
Chapter 49: Asset and Liability Management
Data preparation
Interest rate risk measurement
Liquidity risk measurement
Modeling non-maturity deposits
References
Chapter 50: Capital Adequacy
Principles of the Basel Accords
Risk measures
Risk categories
References
Chapter 51: Systemic Risks
Systemic risk in a nutshell
The dataset used in our examples
Core-periphery decomposition
The simulation method
Possible interpretations and suggestions
References
Chapter 52: Introducing Machine Learning
The origins of machine learning
Uses and abuses of machine learning
How machines learn
Machine learning in practice
Machine learning with R
Chapter 53: Managing and Understanding Data
R data structures
Managing data with R
Exploring and understanding data
Chapter 54: Lazy Learning – Classification Using Nearest Neighbors
Understanding nearest neighbor classification
Example – diagnosing breast cancer with the k-NN algorithm
Chapter 55: Probabilistic Learning – Classification Using Naive Bayes
Understanding Naive Bayes
Example – filtering mobile phone spam with the Naive Bayes algorithm
Chapter 56: Divide and Conquer – Classification Using Decision Trees and Rules
Understanding decision trees
Example – identifying risky bank loans using C5.0 decision trees
Understanding classification rules
Example – identifying poisonous mushrooms with rule learners
Chapter 57: Forecasting Numeric Data – Regression Methods
Understanding regression
Example – predicting medical expenses using linear regression
Understanding regression trees and model trees
Example – estimating the quality of wines with regression trees and model trees
Chapter 58: Black Box Methods – Neural Networks and Support Vector Machines
Understanding neural networks
Example – Modeling the strength of concrete with ANNs
Understanding Support Vector Machines
Example – performing OCR with SVMs
Chapter 59: Finding Patterns – Market Basket Analysis Using Association Rules
Understanding association rules
Example – identifying frequently purchased groceries with association rules
Chapter 60: Finding Groups of Data – Clustering with k-means
Understanding clustering
Example – finding teen market segments using k-means clustering
Chapter 61: Evaluating Model Performance
Measuring performance for classification
Estimating future performance
Chapter 62: Improving Model Performance
Tuning stock models for better performance
Improving model performance with meta-learning
Chapter 63: Specialized Machine Learning Topics
Working with proprietary files and databases
Working with online data and services
Working with domain-specific data
Improving the performance of R

What You Will Learn

  • Describe and visualize the behavior of data and relationships between data
  • Gain a thorough understanding of statistical reasoning and sampling
  • Handle missing data gracefully using multiple imputation
  • Create diverse types of bar charts using the default R functions
  • Familiarize yourself with algorithms written in R for spatial data mining, text mining, and so on
  • Understand relationships between market factors and their impact on your portfolio
  • Harness the power of R to build machine learning algorithms with real-world data science applications
  • Learn specialized machine learning techniques for text mining, big data, and more

Authors

Table of Contents

Chapter 1: RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Chapter 2: The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Chapter 3: Describing Relationships
Multivariate data
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Chapter 4: Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Chapter 5: Using Data to Reason About the World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Chapter 6: Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Chapter 8: Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Chapter 9: Predicting Categorical Variables
k-Nearest Neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Chapter 10: Sources of Data
Relational Databases
Using JSON
XML
Other data formats
Online repositories
Chapter 11: Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Other messiness
Chapter 12: Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Use parallelization
Using Rcpp
Be smarter about your code
Chapter 13: Reproducibility and Best Practices
R Scripting
R projects
Version control
Communicating results
Chapter 14: R Graphics
Base graphics using the default package
Trellis graphs using lattice
Graphs inspired by Grammar of Graphics
Chapter 15: Basic Graph Functions
Introduction
Creating basic scatter plots
Creating line graphs
Creating bar charts
Creating histograms and density plots
Creating box plots
Adjusting x and y axes' limits
Creating heat maps
Creating pairs plots
Creating multiple plot matrix layouts
Adding and formatting legends
Creating graphs with maps
Saving and exporting graphs
Chapter 16: Beyond the Basics – Adjusting Key Parameters
Introduction
Setting colors of points, lines, and bars
Setting plot background colors
Setting colors for text elements – axis annotations, labels, plot titles, and legends
Choosing color combinations and palettes
Setting fonts for annotations and titles
Choosing plotting point symbol styles and sizes
Choosing line styles and width
Choosing box styles
Adjusting axis annotations and tick marks
Formatting log axes
Setting graph margins and dimensions
Chapter 17: Creating Scatter Plots
Introduction
Grouping data points within a scatter plot
Highlighting grouped data points by size and symbol type
Labeling data points
Correlation matrix using pairs plots
Adding error bars
Using jitter to distinguish closely packed data points
Adding linear model lines
Adding nonlinear model curves
Adding nonparametric model curves with lowess
Creating three-dimensional scatter plots
Creating Quantile-Quantile plots
Displaying the data density on axes
Creating scatter plots with a smoothed density representation
Chapter 18: Creating Line Graphs and Time Series Charts
Introduction
Adding customized legends for multiple-line graphs
Using margin labels instead of legends for multiple-line graphs
Adding horizontal and vertical grid lines
Adding marker lines at specific x and y values using abline
Creating sparklines
Plotting functions of a variable in a dataset
Formatting time series data for plotting
Plotting the date or time variable on the x axis
Annotating axis labels in different human-readable time formats
Adding vertical markers to indicate specific time events
Plotting data with varying time-averaging periods
Creating stock charts
Chapter 19: Creating Bar, Dot, and Pie Charts
Introduction
Creating bar charts with more than one factor variable
Creating stacked bar charts
Adjusting the orientation of bars – horizontal and vertical
Adjusting bar widths, spacing, colors, and borders
Displaying values on top of or next to the bars
Placing labels inside bars
Creating bar charts with vertical error bars
Modifying dot charts by grouping variables
Making better, readable pie charts with clockwise-ordered slices
Labeling a pie chart with percentage values for each slice
Adding a legend to a pie chart
Chapter 20: Creating Histograms
Introduction
Visualizing distributions as count frequencies or probability densities
Setting the bin size and the number of breaks
Adjusting histogram styles – bar colors, borders, and axes
Overlaying a density line over a histogram
Multiple histograms along the diagonal of a pairs plot
Histograms in the margins of line and scatter plots
Chapter 21: Box and Whisker Plots
Introduction
Creating box plots with narrow boxes for a small number of variables
Grouping over a variable
Varying box widths by the number of observations
Creating box plots with notches
Including or excluding outliers
Creating horizontal box plots
Changing the box styling
Adjusting the extent of plot whiskers outside the box
Showing the number of observations
Splitting a variable at arbitrary values into subsets
Chapter 22: Creating Heat Maps and Contour Plots
Introduction
Creating heat maps of a single Z variable with a scale
Creating correlation heat maps
Summarizing multivariate data in a single heat map
Creating contour plots
Creating filled contour plots
Creating three-dimensional surface plots
Visualizing time series as calendar heat maps
Chapter 23: Creating Maps
Introduction
Plotting global data by countries on a world map
Creating graphs with regional maps
Plotting data on Google maps
Creating and reading KML data
Working with ESRI shapefiles
Chapter 24: Data Visualization Using Lattice
Introduction
Creating bar charts
Creating stacked bar charts
Creating bar charts to visualize cross-tabulation
Creating a conditional histogram
Visualizing distributions through a kernel-density plot
Creating a normal Q-Q plot
Visualizing an empirical Cumulative Distribution Function
Creating a boxplot
Creating a conditional scatter plot
Chapter 25: Data Visualization Using ggplot2
Introduction
Creating bar charts
Creating multiple bar charts
Creating a bar chart with error bars
Visualizing the density of a numeric variable
Creating a box plot
Creating a layered plot with a scatter plot and fitted line
Creating a line chart
Graph annotation with ggplot
Chapter 26: Inspecting Large Datasets
Introduction
Multivariate continuous data visualization
Multivariate categorical data visualization
Visualizing mixed data
Zooming and filtering
Chapter 27: Three-dimensional Visualizations
Introduction
Three-dimensional scatter plots
Three-dimensional scatter plots with a regression plane
Three-dimensional bar charts
Three-dimensional density plots
Chapter 28: Finalizing Graphs for Publications and Presentations
Introduction
Exporting graphs in high-resolution image formats – PNG, JPEG, BMP, and TIFF
Exporting graphs in vector formats – SVG, PDF, and PS
Adding mathematical and scientific notations (typesetting)
Adding text descriptions to graphs
Using graph templates
Choosing font families and styles under Windows, Mac OS X, and Linux
Choosing fonts for PostScripts and PDFs
Chapter 29: Warming Up
Big data
Data source
Data mining
Social network mining
Text mining
Web data mining
Why R?
Statistics
Machine learning
Data attributes and description
Data cleaning
Data integration
Data dimension reduction
Data transformation and discretization
Visualization of results
Chapter 30: Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Market basket analysis
Hybrid association rules mining
Mining sequence dataset
The R implementation
High-performance algorithms
Chapter 31: Classification
Classification
Generic decision tree induction
High-value credit card customers classification using ID3
Web spam detection using C4.5
Web key resource page judgment using CART
Trojan traffic identification method and Bayes classification
Identify spam e-mail and Naïve Bayes classification
Rule-based classification of player types in computer games and rule-based classification
Chapter 32: Advanced Classification
Ensemble (EM) methods
Biological traits and the Bayesian belief network
Protein classification and the k-Nearest Neighbors algorithm
Document retrieval and Support Vector Machine
Classification using frequent patterns
Classification using the backpropagation algorithm
Chapter 33: Cluster Analysis
Search engines and the k-means algorithm
Automatic abstraction of document texts and the k-medoids algorithm
The CLARA algorithm
CLARANS
Unsupervised image categorization and affinity propagation clustering
News categorization and hierarchical clustering
Chapter 34: Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE clustering
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data
Chapter 35: Outlier Detection
Credit card fraud detection and statistical methods
Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
Intrusion detection and density-based methods
Intrusion detection and clustering-based methods
Monitoring the performance of the web server and classification-based methods
Detecting novelty in text, topic detection, and mining contextual outliers
Collective outliers on spatial data
Outlier detection in high-dimensional data
Chapter 36: Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
Predicting future prices and time-series analysis
Stock market data and time-series clustering and classification
Web click streams and mining symbolic sequences
Mining sequence patterns in transactional databases
Chapter 37: Graph Mining and Network Analysis
Graph mining
Mining frequent subgraph patterns
Social network mining
Chapter 38: Mining Text and Web Data
Text mining and TM packages
Text summarization
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
Web usage mining with web logs
Chapter 39: Time Series Analysis
Multivariate time series analysis
Volatility modeling
References and reading list
Chapter 40: Factor Models
Arbitrage pricing theory
Modeling in R
References
Chapter 41: Forecasting Volume
Motivation
The intensity of trading
The volume forecasting model
Implementation in R
Chapter 42: Big Data – Advanced Analytics
Getting data from open sources
Introduction to big data analysis in R
K-means clustering on big data
Big data linear regression analysis
References
Chapter 43: FX Derivatives
Terminology and notations
Currency options
Exchange options
Quanto options
References
Chapter 44: Interest Rate Derivatives and Models
The Black model
The Vasicek model
The Cox-Ingersoll-Ross model
Parameter estimation of interest rate models
Using the SMFI5 package
References
Chapter 45: Exotic Options
A general pricing approach
The role of dynamic hedging
How R can help a lot
A glance beyond vanillas
Greeks – the link back to the vanilla world
Pricing the Double-no-touch option
Another way to price the Double-no-touch option
The life of a Double-no-touch option – a simulation
Exotic options embedded in structured products
References
Chapter 46: Optimal Hedging
Hedging of derivatives
Hedging in the presence of transaction costs
Further extensions
References
Chapter 47: Fundamental Analysis
The basics of fundamental analysis
Collecting data
Revealing connections
Including multiple variables
Separating investment targets
Setting classification rules
Backtesting
Industry-specific investment
References
Chapter 48: Technical Analysis, Neural Networks, and Logoptimal Portfolios
Market efficiency
Technical analysis
Neural networks
Logoptimal portfolios
References
Chapter 49: Asset and Liability Management
Data preparation
Interest rate risk measurement
Liquidity risk measurement
Modeling non-maturity deposits
References
Chapter 50: Capital Adequacy
Principles of the Basel Accords
Risk measures
Risk categories
References
Chapter 51: Systemic Risks
Systemic risk in a nutshell
The dataset used in our examples
Core-periphery decomposition
The simulation method
Possible interpretations and suggestions
References
Chapter 52: Introducing Machine Learning
The origins of machine learning
Uses and abuses of machine learning
How machines learn
Machine learning in practice
Machine learning with R
Chapter 53: Managing and Understanding Data
R data structures
Managing data with R
Exploring and understanding data
Chapter 54: Lazy Learning – Classification Using Nearest Neighbors
Understanding nearest neighbor classification
Example – diagnosing breast cancer with the k-NN algorithm
Chapter 55: Probabilistic Learning – Classification Using Naive Bayes
Understanding Naive Bayes
Example – filtering mobile phone spam with the Naive Bayes algorithm
Chapter 56: Divide and Conquer – Classification Using Decision Trees and Rules
Understanding decision trees
Example – identifying risky bank loans using C5.0 decision trees
Understanding classification rules
Example – identifying poisonous mushrooms with rule learners
Chapter 57: Forecasting Numeric Data – Regression Methods
Understanding regression
Example – predicting medical expenses using linear regression
Understanding regression trees and model trees
Example – estimating the quality of wines with regression trees and model trees
Chapter 58: Black Box Methods – Neural Networks and Support Vector Machines
Understanding neural networks
Example – Modeling the strength of concrete with ANNs
Understanding Support Vector Machines
Example – performing OCR with SVMs
Chapter 59: Finding Patterns – Market Basket Analysis Using Association Rules
Understanding association rules
Example – identifying frequently purchased groceries with association rules
Chapter 60: Finding Groups of Data – Clustering with k-means
Understanding clustering
Example – finding teen market segments using k-means clustering
Chapter 61: Evaluating Model Performance
Measuring performance for classification
Estimating future performance
Chapter 62: Improving Model Performance
Tuning stock models for better performance
Improving model performance with meta-learning
Chapter 63: Specialized Machine Learning Topics
Working with proprietary files and databases
Working with online data and services
Working with domain-specific data
Improving the performance of R

Book Details

ISBN 139781786463500
Paperback1783 pages
Read More
From 4 reviews

Read More Reviews