Free Sample
+ Collection

R Statistical Application Development by Example Beginner's Guide

Beginner's Guide
Prabhanjan Narayanachar Tattar

This guide assumes no prior knowledge, and starts by introducing you to the very first principles of statistics in R before progressing to more advanced concepts of application development through instructive screenshots and examples.
$26.99
$44.99
RRP $26.99
RRP $44.99
eBook
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781849519441
Paperback344 pages

About This Book

  • A self-learning guide for the user who needs statistical tools for understanding uncertainty in computer science data
  • Essential descriptive statistics, effective data visualization, and efficient model building
  • Every method explained through real data sets enables clarity and confidence for unforeseen scenarios

Who This Book Is For

You will begin learning the first concepts of statistics in R which is vital in this fast paced era and it is also a bargain as you do not need to do a preliminary course on the subject.

Table of Contents

Chapter 1: Data Characteristics
Questionnaire and its components
Experiments with uncertainty in computer science
R installation
Continuous distribution
Summary
Chapter 2: Import/Export Data
data.frame and other formats
Time for action – understanding constants, vectors, and basic arithmetic
Time for action – matrix computations
Time for action – creating a list object
Time for action – creating a data.frame object
Summary
Chapter 3: Data Visualization
Visualization techniques for categorical data
Time for action – bar charts in R
Time for action – dot charts in R
Time for action – the spine plot for the shift and operator data
Time for action – the mosaic plot for the Titanic dataset
Visualization techniques for continuous variable data
Time for action – using the boxplot
Time for action – understanding the effectiveness of histograms
Time for action – plot and pairs R functions
A brief peek at ggplot2
Time for action – qplot
Time for action – ggplot
Summary
Chapter 4: Exploratory Analysis
Essential summary statistics
Time for action – the essential summary statistics for "The Wall" dataset
The stem-and-leaf plot
Time for action – the stem function in play
Letter values
Data re-expression
Bagplot – a bivariate boxplot
Time for action – the bagplot display for a multivariate dataset
The resistant line
Time for action – the resistant line as a first regression model
Smoothing data
Time for action – smoothening the cow temperature data
Median polish
Time for action – the median polish algorithm
Summary
Chapter 5: Statistical Inference
Maximum likelihood estimator
Time for action – visualizing the likelihood function
Time for action – finding the MLE using mle and fitdistr functions
Confidence intervals
Time for action – confidence intervals
Hypotheses testing
Time for action – testing the probability of success
Time for action – testing proportions
Time for action – testing one-sample hypotheses
Time for action – testing two-sample hypotheses
Summary
Chapter 6: Linear Regression Analysis
The simple linear regression model
Time for action – the arbitrary choice of parameters
Time for action – building a simple linear regression model
Time for action – ANOVA and the confidence intervals
Time for action – residual plots for model validation
Multiple linear regression model
Time for action – averaging k simple linear regression models
Time for action – building a multiple linear regression model
Time for action – the ANOVA and confidence intervals for the multiple linear regression model
Time for action – residual plots for the multiple linear regression model
Regression diagnostics
The multicollinearity problem
Time for action – addressing the multicollinearity problem for the Gasoline data
Model selection
Time for action – model selection using the backward, forward, and AIC criteria
Summary
Chapter 7: The Logistic Regression Model
The binary regression problem
Time for action – limitations of linear regression models
Probit regression model
Time for action – understanding the constants
Logistic regression model
Time for action – fitting the logistic regression model
Time for action – The Hosmer-Lemeshow goodness-of-fit statistic
Model validation and diagnostics
Time for action – residual plots for the logistic regression model
Time for action – diagnostics for the logistic regression
Receiving operator curves
Time for action – ROC construction
Logistic regression for the German credit screening dataset
Time for action – logistic regression for the German credit dataset
Summary
Chapter 8: Regression Models with Regularization
The overfitting problem
Time for action – understanding overfitting
Regression spline
Time for action – fitting piecewise linear regression models
Time for action – fitting the spline regression models
Ridge regression for linear models
Time for action – ridge regression for the linear regression model
Ridge regression for logistic regression models
Time for action – ridge regression for the logistic regression model
Another look at model assessment
Time for action – selecting lambda iteratively and other topics
Summary
Chapter 9: Classification and Regression Trees
Recursive partitions
Time for action – partitioning the display plot
Time for action – building our first tree
The construction of a regression tree
Time for action – the construction of a regression tree
The construction of a classification tree
Time for action – the construction of a classification tree
Classification tree for the German credit data
Time for action – the construction of a classification tree
Pruning and other finer aspects of a tree
Time for action – pruning a classification tree
Summary
Chapter 10: CART and Beyond
Improving CART
Time for action – cross-validation predictions
Bagging
Time for action – understanding the bootstrap technique
Time for action – the bagging algorithm
Random forests
Time for action – random forests for the German credit data
The consolidation
Time for action – random forests for the low birth weight data
Summary

What You Will Learn

  • Learn the nature of data through software which takes the preliminary concepts right away in R
  • Read data from various sources and export the R output to other software
  • Perform effective data visualization which respects the nature of variables and with rich alternative options
  • Do exploratory data analysis for useful first understanding which builds up the right attitude towards effective inference
  • Learn statistical inference through simulation combining the classical inference and modern computational power
  • Delve deep into regression models such as linear and logistic for continuous and discrete regressands which form the fundamentals of modern statistics
  • Introduce yourself to CART – a machine learning tool which is very useful when the data has an intrinsic nonlinearity

In Detail

"R Statistical Application Development by Example Beginner’s Guide" explores statistical concepts and the R software, which are well integrated from the word go. This demarcates the separate learning of theory and applications and hence the title begins with “R Statistical …”. Almost every concept has an R code going with it which exemplifies the strength of R and applications. Thus, the reader first understands the data characteristics, descriptive statistics, and the exploratory attitude which gives the first firm footing of data analysis. Statistical inference and the use of simulation which makes use of the computational power complete the technical footing of statistical methods. Regression modeling, linear, logistic, and CART, builds the essential toolkit which helps the reader complete complex problems in the real world.

The reader will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code.

The data analysis journey begins with exploratory analysis, which is more than simple descriptive data summaries, and then takes the traditional path up to linear regression modeling, and ends with logistic regression, CART, and spatial statistics.

True to the title R Statistical Application Development by Example Beginner’s Guide, the reader will enjoy the examples and R software.

Authors

Read More