R Statistical Application Development by Example Beginner's Guide


R Statistical Application Development by Example Beginner's Guide
eBook: $26.99
Formats: PDF, PacktLib, ePub and Mobi formats
$22.94
save 15%!
Print + free eBook + free PacktLib access to the book: $71.98    Print cover: $44.99
$44.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Overview
Table of Contents
Author
Support
Sample Chapters
  • A self-learning guide for the user who needs statistical tools for understanding uncertainty in computer science data
  • Essential descriptive statistics, effective data visualization, and efficient model building
  • Every method explained through real data sets enables clarity and confidence for unforeseen scenarios

Book Details

Language : English
Paperback : 344 pages [ 235mm x 191mm ]
Release Date : July 2013
ISBN : 1849519447
ISBN 13 : 9781849519441
Author(s) : Prabhanjan Narayanachar Tattar
Topics and Technologies : All Books, Big Data and Business Intelligence, Data, Beginner's Guides, Open Source

Table of Contents

Preface
Chapter 1: Data Characteristics
Chapter 2: Import/Export Data
Chapter 3: Data Visualization
Chapter 4: Exploratory Analysis
Chapter 5: Statistical Inference
Chapter 6: Linear Regression Analysis
Chapter 7: The Logistic Regression Model
Chapter 8: Regression Models with Regularization
Chapter 9: Classification and Regression Trees
Chapter 10: CART and Beyond
Appendix: References
Index
  • Chapter 1: Data Characteristics
    • Questionnaire and its components
      • Understanding the data characteristics in an R environment
    • Experiments with uncertainty in computer science
    • R installation
      • Using R packages
      • RSADBE – the book's R package
      • Discrete distribution
      • Discrete uniform distribution
      • Binomial distribution
      • Hypergeometric distribution
      • Negative binomial distribution
      • Poisson distribution
    • Continuous distribution
      • Uniform distribution
      • Exponential distribution
      • Normal distribution
    • Summary
    • Chapter 2: Import/Export Data
      • data.frame and other formats
        • Constants, vectors, and matrices
      • Time for action – understanding constants, vectors, and basic arithmetic
      • Time for action – matrix computations
        • The list object
      • Time for action – creating a list object
        • The data.frame object
      • Time for action – creating a data.frame object
        • The table object
      • Time for action – creating the Titanic dataset as a table object
      • read.csv, read.xls, and the foreign package
      • Time for action – importing data from external files
        • Importing data from MySQL
      • Exporting data/graphs
        • Exporting R objects
        • Exporting graphs
      • Time for action – exporting a graph
      • Managing an R session
      • Time for action – session management
      • Summary
      • Chapter 3: Data Visualization
        • Visualization techniques for categorical data
          • Bar charts
            • Going through the built-in examples of R
        • Time for action – bar charts in R
          • Dot charts
        • Time for action – dot charts in R
          • Spine and mosaic plots
        • Time for action – the spine plot for the shift and operator data
        • Time for action – the mosaic plot for the Titanic dataset
          • Pie charts and the fourfold plot
        • Visualization techniques for continuous variable data
          • Boxplot
        • Time for action – using the boxplot
          • Histograms
        • Time for action – understanding the effectiveness of histograms
          • Scatter plots
        • Time for action – plot and pairs R functions
          • Pareto charts
        • A brief peek at ggplot2
        • Time for action – qplot
        • Time for action – ggplot
        • Summary
        • Chapter 4: Exploratory Analysis
          • Essential summary statistics
            • Percentiles, quantiles, and median
            • Hinges
            • The interquartile range
          • Time for action – the essential summary statistics for "The Wall" dataset
          • The stem-and-leaf plot
          • Time for action – the stem function in play
          • Letter values
          • Data re-expression
          • Bagplot – a bivariate boxplot
          • Time for action – the bagplot display for a multivariate dataset
          • The resistant line
          • Time for action – the resistant line as a first regression model
          • Smoothing data
          • Time for action – smoothening the cow temperature data
          • Median polish
          • Time for action – the median polish algorithm
          • Summary
          • Chapter 5: Statistical Inference
            • Maximum likelihood estimator
              • Visualizing the likelihood function
            • Time for action – visualizing the likelihood function
              • Finding the maximum likelihood estimator
              • Using the fitdistr function
            • Time for action – finding the MLE using mle and fitdistr functions
            • Confidence intervals
            • Time for action – confidence intervals
            • Hypotheses testing
              • Binomial test
            • Time for action – testing the probability of success
              • Tests of proportions and the chi-square test
            • Time for action – testing proportions
              • Tests based on normal distribution – one-sample
            • Time for action – testing one-sample hypotheses
              • Tests based on normal distribution – two-sample
            • Time for action – testing two-sample hypotheses
            • Summary
            • Chapter 6: Linear Regression Analysis
              • The simple linear regression model
                • What happens to the arbitrary choice of parameters?
              • Time for action – the arbitrary choice of parameters
                • Building a simple linear regression model
              • Time for action – building a simple linear regression model
                • ANOVA and the confidence intervals
              • Time for action – ANOVA and the confidence intervals
                • Model validation
              • Time for action – residual plots for model validation
              • Multiple linear regression model
                • Averaging k simple linear regression models or a multiple linear regression model
              • Time for action – averaging k simple linear regression models
                • Building a multiple linear regression model
              • Time for action – building a multiple linear regression model
                • The ANOVA and confidence intervals for the multiple linear regression model
              • Time for action – the ANOVA and confidence intervals for the multiple linear regression model
                • Useful residual plots
              • Time for action – residual plots for the multiple linear regression model
              • Regression diagnostics
                • Leverage points
                • Influential points
                • DFFITS and DFBETAS
              • The multicollinearity problem
              • Time for action – addressing the multicollinearity problem for the Gasoline data
              • Model selection
                • Stepwise procedures
                  • The backward elimination
                  • The forward selection
                • Criterion-based procedures
              • Time for action – model selection using the backward, forward, and AIC criteria
              • Summary
              • Chapter 7: The Logistic Regression Model
                • The binary regression problem
                • Time for action – limitations of linear regression models
                • Probit regression model
                • Time for action – understanding the constants
                • Logistic regression model
                • Time for action – fitting the logistic regression model
                  • Hosmer-Lemeshow goodness-of-fit test statistic
                • Time for action – the Hosmer-Lemeshow goodness-of-fit statistic
                • Model validation and diagnostics
                  • Residual plots for the GLM
                • Time for action – residual plots for the logistic regression model
                  • Influence and leverage for the GLM
                • Time for action – diagnostics for the logistic regression
                • Receiving operator curves
                • Time for action – ROC construction
                • Logistic regression for the German credit screening dataset
                • Time for action – logistic regression for the German credit dataset
                • Summary
                • Chapter 8: Regression Models with Regularization
                  • The overfitting problem
                  • Time for action – understanding overfitting
                  • Regression spline
                    • Basis functions
                    • Piecewise linear regression model
                  • Time for action – fitting piecewise linear regression models
                    • Natural cubic splines and the general B-splines
                  • Time for action – fitting the spline regression models
                  • Ridge regression for linear models
                  • Time for action – ridge regression for the linear regression model
                  • Ridge regression for logistic regression models
                  • Time for action – ridge regression for the logistic regression model
                  • Another look at model assessment
                  • Time for action – selecting lambda iteratively and other topics
                  • Summary
                  • Chapter 9: Classification and Regression Trees
                    • Recursive partitions
                    • Time for action – partitioning the display plot
                      • Splitting the data
                      • The first tree
                    • Time for action – building our first tree
                    • The construction of a regression tree
                    • Time for action – the construction of a regression tree
                    • The construction of a classification tree
                    • Time for action – the construction of a classification tree
                    • Classification tree for the German credit data
                    • Time for action – the construction of a classification tree
                    • Pruning and other finer aspects of a tree
                    • Time for action – pruning a classification tree
                    • Summary
                    • Chapter 10: CART and Beyond
                      • Improving CART
                      • Time for action – cross-validation predictions
                      • Bagging
                        • The bootstrap
                      • Time for action – understanding the bootstrap technique
                        • The bagging algorithm
                      • Time for action – the bagging algorithm
                      • Random forests
                      • Time for action – random forests for the German credit data
                      • The consolidation
                      • Time for action – random forests for the low birth weight data
                      • Summary

                      Prabhanjan Narayanachar Tattar

                      Prabhanjan Narayanachar Tattar has seven years of experience with R software and has also co-authored the book A Course in Statistics with R published by Narosa Publishing House. The author has built two packages in R titled gpk and ACSWR. He has obtained a PhD (Statistics) from Bangalore University under the broad area of Survival Analysis and published several articles in peer-reviewed journals. During the PhD program, the author received the young Statistician honors in IBS(IR)-GK Shukla Young Biometrician Award (2005) and Dr. U.S. Nair Award for Young Statistician (2007) and also held a Junior and Senior Research Fellowship of CSIR-UGC. Prabhanjan is working as a Business Analysis Advisor at Dell Inc, Bangalore. He is working for the Customer Service Analytics unit of the larger Dell Global Analytics arm of Dell.
                      Sorry, we don't have any reviews for this title yet.

                      Code Downloads

                      Download the code and support files for this book.


                      Submit Errata

                      Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                      Sample chapters

                      You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                      Frequently bought together

                      R Statistical Application Development by Example Beginner's Guide +    R Graph Cookbook =
                      50% Off
                      the second eBook
                      Price for both: £16.14

                      Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                      What you will learn from this book

                      • Learn the nature of data through software which takes the preliminary concepts right away in R
                      • Read data from various sources and export the R output to other software
                      • Perform effective data visualization which respects the nature of variables and with rich alternative options
                      • Do exploratory data analysis for useful first understanding which builds up the right attitude towards effective inference
                      • Learn statistical inference through simulation combining the classical inference and modern computational power
                      • Delve deep into regression models such as linear and logistic for continuous and discrete regressands which form the fundamentals of modern statistics
                      • Introduce yourself to CART – a machine learning tool which is very useful when the data has an intrinsic nonlinearity

                      In Detail

                      "R Statistical Application Development by Example Beginner’s Guide" explores statistical concepts and the R software, which are well integrated from the word go. This demarcates the separate learning of theory and applications and hence the title begins with “R Statistical …”. Almost every concept has an R code going with it which exemplifies the strength of R and applications. Thus, the reader first understands the data characteristics, descriptive statistics, and the exploratory attitude which gives the first firm footing of data analysis. Statistical inference and the use of simulation which makes use of the computational power complete the technical footing of statistical methods. Regression modeling, linear, logistic, and CART, builds the essential toolkit which helps the reader complete complex problems in the real world.

                      The reader will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code.

                      The data analysis journey begins with exploratory analysis, which is more than simple descriptive data summaries, and then takes the traditional path up to linear regression modeling, and ends with logistic regression, CART, and spatial statistics.

                      True to the title R Statistical Application Development by Example Beginner’s Guide, the reader will enjoy the examples and R software.

                      Approach

                      Full of screenshots and examples, this Beginner’s Guide by Example will teach you practically everything you need to know about R statistical application development from scratch.

                      Who this book is for

                      You will begin learning the first concepts of statistics in R which is vital in this fast paced era and it is also a bargain as you do not need to do a preliminary course on the subject.

                      Code Download and Errata
                      Packt Anytime, Anywhere
                      Register Books
                      Print Upgrades
                      eBook Downloads
                      Video Support
                      Contact Us
                      Awards Voting Nominations Previous Winners
                      Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                      Resources
                      Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software