Mastering Data Analysis with R

Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualization

Mastering Data Analysis with R

Mastering
Gergely Daróczi

8 customer reviews
Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualization
$43.99
$54.99
RRP $43.99
RRP $54.99
eBook
Print + eBook

Instantly access this course right now and get the skills you need in 2016

With unlimited access to a constantly growing library of over 3,500 courses, a subscription to Mapt gives you everything you need to get that next promotion or to land that dream job. Cancel anytime.

+ Collection
Free Sample

Book Details

ISBN 139781783982028
Paperback396 pages

Book Description

R is an essential language for sharp and successful data analysis. Its numerous features and ease of use make it a powerful way of mining, managing, and interpreting large sets of data. In a world where understanding big data has become key, by mastering R you will be able to deal with your data effectively and efficiently.

This book will give you the guidance you need to build and develop your knowledge and expertise. Bridging the gap between theory and practice, this book will help you to understand and use data for a competitive advantage.

Beginning with taking you through essential data mining and management tasks such as munging, fetching, cleaning, and restructuring, the book then explores different model designs and the core components of effective analysis. You will then discover how to optimize your use of machine learning algorithms for classification and recommendation systems beside the traditional and more recent statistical methods.

Table of Contents

Chapter 1: Hello, Data!
Loading text files of a reasonable size
Benchmarking text file parsers
Loading a subset of text files
Loading data from databases
Importing data from other statistical systems
Loading Excel spreadsheets
Summary
Chapter 2: Getting Data from the Web
Loading datasets from the Internet
Other popular online data formats
Reading data from HTML tables
Scraping data from other online sources
R packages to interact with data source APIs
Summary
Chapter 3: Filtering and Summarizing Data
Drop needless data
Aggregation
Running benchmarks
Summary functions
Summary
Chapter 4: Restructuring Data
Transposing matrices
Filtering data by string matching
Rearranging data
dplyr versus data.table
Computing new variables
Merging datasets
Reshaping data in a flexible way
The evolution of the reshape packages
Summary
Chapter 5: Building Models (authored by Renata Nemeth and Gergely Toth)
The motivation behind multivariate models
Linear regression with continuous predictors
Model assumptions
How well does the line fit in the data?
Discrete predictors
Summary
Chapter 6: Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
The modeling workflow
Logistic regression
Models for count data
Summary
Chapter 7: Unstructured Data
Importing the corpus
Cleaning the corpus
Visualizing the most frequent words in the corpus
Further cleanup
Analyzing the associations among terms
Some other metrics
The segmentation of documents
Summary
Chapter 8: Polishing Data
The types and origins of missing data
Identifying missing data
By-passing missing values
Getting rid of missing data
Filtering missing data before or during the actual analysis
Data imputation
Extreme values and outliers
Using robust methods
Summary
Chapter 9: From Big to Small Data
Adequacy tests
Principal Component Analysis
Factor analysis
Principal Component Analysis versus Factor Analysis
Multidimensional Scaling
Summary
Chapter 10: Classification and Clustering
Cluster analysis
Latent class models
Discriminant analysis
Logistic regression
Machine learning algorithms
Summary
Chapter 11: Social Network Analysis of the R Ecosystem
Loading network data
Centrality measures of networks
Visualizing network data
Further network analysis resources
Summary
Chapter 12: Analyzing Time-series
Creating time-series objects
Visualizing time-series
Seasonal decomposition
Holt-Winters filtering
Autoregressive Integrated Moving Average models
Outlier detection
More complex time-series objects
Advanced time-series analysis
Summary
Chapter 13: Data Around Us
Geocoding
Visualizing point data in space
Finding polygon overlays of point data
Plotting thematic maps
Rendering polygons around points
Satellite maps
Interactive maps
Alternative map designs
Spatial statistics
Summary
Chapter 14: Analyzing the R Community
R Foundation members
R package maintainers
The R-help mailing list
Analyzing overlaps between our lists of R users
The number of R users in social media
R-related posts in social media
Summary

What You Will Learn

  • Connect to and load data from R’s range of powerful databases
  • Successfully fetch and parse structured and unstructured data
  • Transform and restructure your data with efficient R packages
  • Define and build complex statistical models with glm
  • Develop and train machine learning algorithms
  • Visualize social networks and graph data
  • Deploy supervised and unsupervised classification algorithms
  • Discover how to visualize spatial data with R

Authors

Table of Contents

Chapter 1: Hello, Data!
Loading text files of a reasonable size
Benchmarking text file parsers
Loading a subset of text files
Loading data from databases
Importing data from other statistical systems
Loading Excel spreadsheets
Summary
Chapter 2: Getting Data from the Web
Loading datasets from the Internet
Other popular online data formats
Reading data from HTML tables
Scraping data from other online sources
R packages to interact with data source APIs
Summary
Chapter 3: Filtering and Summarizing Data
Drop needless data
Aggregation
Running benchmarks
Summary functions
Summary
Chapter 4: Restructuring Data
Transposing matrices
Filtering data by string matching
Rearranging data
dplyr versus data.table
Computing new variables
Merging datasets
Reshaping data in a flexible way
The evolution of the reshape packages
Summary
Chapter 5: Building Models (authored by Renata Nemeth and Gergely Toth)
The motivation behind multivariate models
Linear regression with continuous predictors
Model assumptions
How well does the line fit in the data?
Discrete predictors
Summary
Chapter 6: Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
The modeling workflow
Logistic regression
Models for count data
Summary
Chapter 7: Unstructured Data
Importing the corpus
Cleaning the corpus
Visualizing the most frequent words in the corpus
Further cleanup
Analyzing the associations among terms
Some other metrics
The segmentation of documents
Summary
Chapter 8: Polishing Data
The types and origins of missing data
Identifying missing data
By-passing missing values
Getting rid of missing data
Filtering missing data before or during the actual analysis
Data imputation
Extreme values and outliers
Using robust methods
Summary
Chapter 9: From Big to Small Data
Adequacy tests
Principal Component Analysis
Factor analysis
Principal Component Analysis versus Factor Analysis
Multidimensional Scaling
Summary
Chapter 10: Classification and Clustering
Cluster analysis
Latent class models
Discriminant analysis
Logistic regression
Machine learning algorithms
Summary
Chapter 11: Social Network Analysis of the R Ecosystem
Loading network data
Centrality measures of networks
Visualizing network data
Further network analysis resources
Summary
Chapter 12: Analyzing Time-series
Creating time-series objects
Visualizing time-series
Seasonal decomposition
Holt-Winters filtering
Autoregressive Integrated Moving Average models
Outlier detection
More complex time-series objects
Advanced time-series analysis
Summary
Chapter 13: Data Around Us
Geocoding
Visualizing point data in space
Finding polygon overlays of point data
Plotting thematic maps
Rendering polygons around points
Satellite maps
Interactive maps
Alternative map designs
Spatial statistics
Summary
Chapter 14: Analyzing the R Community
R Foundation members
R package maintainers
The R-help mailing list
Analyzing overlaps between our lists of R users
The number of R users in social media
R-related posts in social media
Summary

Book Details

ISBN 139781783982028
Paperback396 pages
Read More
From 8 reviews

Read More Reviews