# R Statistical Application Development by Example Beginner's Guide

Formats:

save 15%!

save 37%!

**Free Shipping!**

Also available on: |

- A self-learning guide for the user who needs statistical tools for understanding uncertainty in computer science data
- Essential descriptive statistics, effective data visualization, and efficient model building
- Every method explained through real data sets enables clarity and confidence for unforeseen scenarios

### Book Details

**Language :**English

**Paperback :**344 pages [ 235mm x 191mm ]

**Release Date :**July 2013

**ISBN :**1849519447

**ISBN 13 :**9781849519441

**Author(s) :**Prabhanjan Narayanachar Tattar

**Topics and Technologies :**All Books, Big Data and Business Intelligence, Data, Beginner's Guides, Open Source

## Table of Contents

PrefaceChapter 1: Data Characteristics

Chapter 2: Import/Export Data

Chapter 3: Data Visualization

Chapter 4: Exploratory Analysis

Chapter 5: Statistical Inference

Chapter 6: Linear Regression Analysis

Chapter 7: The Logistic Regression Model

Chapter 8: Regression Models with Regularization

Chapter 9: Classification and Regression Trees

Chapter 10: CART and Beyond

Appendix: References

Index

- Chapter 1: Data Characteristics
- Questionnaire and its components
- Understanding the data characteristics in an R environment
- Experiments with uncertainty in computer science
- R installation
- Using R packages
- RSADBE – the book's R package
- Discrete distribution
- Discrete uniform distribution
- Binomial distribution
- Hypergeometric distribution
- Negative binomial distribution
- Poisson distribution
- Continuous distribution
- Uniform distribution
- Exponential distribution
- Normal distribution
- Summary

- Chapter 2: Import/Export Data
- data.frame and other formats
- Constants, vectors, and matrices
- Time for action – understanding constants, vectors, and basic arithmetic
- Time for action – matrix computations
- The list object
- Time for action – creating a list object
- The data.frame object
- Time for action – creating a data.frame object
- The table object
- Time for action – creating the Titanic dataset as a table object
- read.csv, read.xls, and the foreign package
- Time for action – importing data from external files
- Importing data from MySQL
- Exporting data/graphs
- Exporting R objects
- Exporting graphs
- Time for action – exporting a graph
- Managing an R session
- Time for action – session management
- Summary

- Chapter 3: Data Visualization
- Visualization techniques for categorical data
- Bar charts
- Going through the built-in examples of R
- Time for action – bar charts in R
- Dot charts
- Time for action – dot charts in R
- Spine and mosaic plots
- Time for action – the spine plot for the shift and operator data
- Time for action – the mosaic plot for the Titanic dataset
- Pie charts and the fourfold plot
- Visualization techniques for continuous variable data
- Boxplot
- Time for action – using the boxplot
- Histograms
- Time for action – understanding the effectiveness of histograms
- Scatter plots
- Time for action – plot and pairs R functions
- Pareto charts
- A brief peek at ggplot2
- Time for action – qplot
- Time for action – ggplot
- Summary

- Chapter 4: Exploratory Analysis
- Essential summary statistics
- Percentiles, quantiles, and median
- Hinges
- The interquartile range
- Time for action – the essential summary statistics for "The Wall" dataset
- The stem-and-leaf plot
- Time for action – the stem function in play
- Letter values
- Data re-expression
- Bagplot – a bivariate boxplot
- Time for action – the bagplot display for a multivariate dataset
- The resistant line
- Time for action – the resistant line as a first regression model
- Smoothing data
- Time for action – smoothening the cow temperature data
- Median polish
- Time for action – the median polish algorithm
- Summary

- Chapter 5: Statistical Inference
- Maximum likelihood estimator
- Visualizing the likelihood function
- Time for action – visualizing the likelihood function
- Finding the maximum likelihood estimator
- Using the fitdistr function
- Time for action – finding the MLE using mle and fitdistr functions
- Confidence intervals
- Time for action – confidence intervals
- Hypotheses testing
- Binomial test
- Time for action – testing the probability of success
- Tests of proportions and the chi-square test
- Time for action – testing proportions
- Tests based on normal distribution – one-sample
- Time for action – testing one-sample hypotheses
- Tests based on normal distribution – two-sample
- Time for action – testing two-sample hypotheses
- Summary

- Chapter 6: Linear Regression Analysis
- The simple linear regression model
- What happens to the arbitrary choice of parameters?
- Time for action – the arbitrary choice of parameters
- Building a simple linear regression model
- Time for action – building a simple linear regression model
- ANOVA and the confidence intervals
- Time for action – ANOVA and the confidence intervals
- Model validation
- Time for action – residual plots for model validation
- Multiple linear regression model
- Averaging k simple linear regression models or a multiple linear regression model
- Time for action – averaging k simple linear regression models
- Building a multiple linear regression model
- Time for action – building a multiple linear regression model
- The ANOVA and confidence intervals for the multiple linear regression model
- Time for action – the ANOVA and confidence intervals for the multiple linear regression model
- Useful residual plots
- Time for action – residual plots for the multiple linear regression model
- Regression diagnostics
- Leverage points
- Influential points
- DFFITS and DFBETAS
- The multicollinearity problem
- Time for action – addressing the multicollinearity problem for the Gasoline data
- Model selection
- Stepwise procedures
- The backward elimination
- The forward selection
- Criterion-based procedures
- Time for action – model selection using the backward, forward, and AIC criteria
- Summary

- Chapter 7: The Logistic Regression Model
- The binary regression problem
- Time for action – limitations of linear regression models
- Probit regression model
- Time for action – understanding the constants
- Logistic regression model
- Time for action – fitting the logistic regression model
- Hosmer-Lemeshow goodness-of-fit test statistic
- Time for action – the Hosmer-Lemeshow goodness-of-fit statistic
- Model validation and diagnostics
- Residual plots for the GLM
- Time for action – residual plots for the logistic regression model
- Influence and leverage for the GLM
- Time for action – diagnostics for the logistic regression
- Receiving operator curves
- Time for action – ROC construction
- Logistic regression for the German credit screening dataset
- Time for action – logistic regression for the German credit dataset
- Summary

- Chapter 8: Regression Models with Regularization
- The overfitting problem
- Time for action – understanding overfitting
- Regression spline
- Basis functions
- Piecewise linear regression model
- Time for action – fitting piecewise linear regression models
- Natural cubic splines and the general B-splines
- Time for action – fitting the spline regression models
- Ridge regression for linear models
- Time for action – ridge regression for the linear regression model
- Ridge regression for logistic regression models
- Time for action – ridge regression for the logistic regression model
- Another look at model assessment
- Time for action – selecting lambda iteratively and other topics
- Summary

- Chapter 9: Classification and Regression Trees
- Recursive partitions
- Time for action – partitioning the display plot
- Splitting the data
- The first tree
- Time for action – building our first tree
- The construction of a regression tree
- Time for action – the construction of a regression tree
- The construction of a classification tree
- Time for action – the construction of a classification tree
- Classification tree for the German credit data
- Time for action – the construction of a classification tree
- Pruning and other finer aspects of a tree
- Time for action – pruning a classification tree
- Summary

- Chapter 10: CART and Beyond
- Improving CART
- Time for action – cross-validation predictions
- Bagging
- The bootstrap
- Time for action – understanding the bootstrap technique
- The bagging algorithm
- Time for action – the bagging algorithm
- Random forests
- Time for action – random forests for the German credit data
- The consolidation
- Time for action – random forests for the low birth weight data
- Summary

### Prabhanjan Narayanachar Tattar

### Code Downloads

Download the code and support files for this book.

### Submit Errata

Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

### Sample chapters

You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

- Learn the nature of data through software which takes the preliminary concepts right away in R
- Read data from various sources and export the R output to other software
- Perform effective data visualization which respects the nature of variables and with rich alternative options
- Do exploratory data analysis for useful first understanding which builds up the right attitude towards effective inference
- Learn statistical inference through simulation combining the classical inference and modern computational power
- Delve deep into regression models such as linear and logistic for continuous and discrete regressands which form the fundamentals of modern statistics
- Introduce yourself to CART – a machine learning tool which is very useful when the data has an intrinsic nonlinearity

"R Statistical Application Development by Example Beginner’s Guide" explores statistical concepts and the R software, which are well integrated from the word go. This demarcates the separate learning of theory and applications and hence the title begins with “R Statistical …”. Almost every concept has an R code going with it which exemplifies the strength of R and applications. Thus, the reader first understands the data characteristics, descriptive statistics, and the exploratory attitude which gives the first firm footing of data analysis. Statistical inference and the use of simulation which makes use of the computational power complete the technical footing of statistical methods. Regression modeling, linear, logistic, and CART, builds the essential toolkit which helps the reader complete complex problems in the real world.

The reader will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code.

The data analysis journey begins with exploratory analysis, which is more than simple descriptive data summaries, and then takes the traditional path up to linear regression modeling, and ends with logistic regression, CART, and spatial statistics.

True to the title R Statistical Application Development by Example Beginner’s Guide, the reader will enjoy the examples and R software.

Full of screenshots and examples, this Beginner’s Guide by Example will teach you practically everything you need to know about R statistical application development from scratch.

You will begin learning the first concepts of statistics in R which is vital in this fast paced era and it is also a bargain as you do not need to do a preliminary course on the subject.