Mastering Data Analysis with R

Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualization

Mastering Data Analysis with R

Gergely Daróczi

1 customer reviews
Gain sharp insights into your data and solve real-world data science problems with R—from data munging to modeling and visualization
Mapt Subscription
FREE
$29.99/m after trial
eBook
$30.80
RRP $43.99
Save 29%
Print + eBook
$54.99
RRP $54.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$30.80
$54.99
$29.99p/m after trial
RRP $43.99
RRP $54.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783982028
Paperback396 pages

Book Description

R is an essential language for sharp and successful data analysis. Its numerous features and ease of use make it a powerful way of mining, managing, and interpreting large sets of data. In a world where understanding big data has become key, by mastering R you will be able to deal with your data effectively and efficiently.

This book will give you the guidance you need to build and develop your knowledge and expertise. Bridging the gap between theory and practice, this book will help you to understand and use data for a competitive advantage.

Beginning with taking you through essential data mining and management tasks such as munging, fetching, cleaning, and restructuring, the book then explores different model designs and the core components of effective analysis. You will then discover how to optimize your use of machine learning algorithms for classification and recommendation systems beside the traditional and more recent statistical methods.

Table of Contents

Chapter 1: Hello, Data!
Loading text files of a reasonable size
Benchmarking text file parsers
Loading a subset of text files
Loading data from databases
Importing data from other statistical systems
Loading Excel spreadsheets
Summary
Chapter 2: Getting Data from the Web
Loading datasets from the Internet
Other popular online data formats
Reading data from HTML tables
Scraping data from other online sources
R packages to interact with data source APIs
Summary
Chapter 3: Filtering and Summarizing Data
Drop needless data
Aggregation
Running benchmarks
Summary functions
Summary
Chapter 4: Restructuring Data
Transposing matrices
Filtering data by string matching
Rearranging data
dplyr versus data.table
Computing new variables
Merging datasets
Reshaping data in a flexible way
The evolution of the reshape packages
Summary
Chapter 5: Building Models (authored by Renata Nemeth and Gergely Toth)
The motivation behind multivariate models
Linear regression with continuous predictors
Model assumptions
How well does the line fit in the data?
Discrete predictors
Summary
Chapter 6: Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
The modeling workflow
Logistic regression
Models for count data
Summary
Chapter 7: Unstructured Data
Importing the corpus
Cleaning the corpus
Visualizing the most frequent words in the corpus
Further cleanup
Analyzing the associations among terms
Some other metrics
The segmentation of documents
Summary
Chapter 8: Polishing Data
The types and origins of missing data
Identifying missing data
By-passing missing values
Getting rid of missing data
Filtering missing data before or during the actual analysis
Data imputation
Extreme values and outliers
Using robust methods
Summary
Chapter 9: From Big to Small Data
Adequacy tests
Principal Component Analysis
Factor analysis
Principal Component Analysis versus Factor Analysis
Multidimensional Scaling
Summary
Chapter 10: Classification and Clustering
Cluster analysis
Latent class models
Discriminant analysis
Logistic regression
Machine learning algorithms
Summary
Chapter 11: Social Network Analysis of the R Ecosystem
Loading network data
Centrality measures of networks
Visualizing network data
Further network analysis resources
Summary
Chapter 12: Analyzing Time-series
Creating time-series objects
Visualizing time-series
Seasonal decomposition
Holt-Winters filtering
Autoregressive Integrated Moving Average models
Outlier detection
More complex time-series objects
Advanced time-series analysis
Summary
Chapter 13: Data Around Us
Geocoding
Visualizing point data in space
Finding polygon overlays of point data
Plotting thematic maps
Rendering polygons around points
Satellite maps
Interactive maps
Alternative map designs
Spatial statistics
Summary
Chapter 14: Analyzing the R Community
R Foundation members
R package maintainers
The R-help mailing list
Analyzing overlaps between our lists of R users
The number of R users in social media
R-related posts in social media
Summary

What You Will Learn

  • Connect to and load data from R’s range of powerful databases
  • Successfully fetch and parse structured and unstructured data
  • Transform and restructure your data with efficient R packages
  • Define and build complex statistical models with glm
  • Develop and train machine learning algorithms
  • Visualize social networks and graph data
  • Deploy supervised and unsupervised classification algorithms
  • Discover how to visualize spatial data with R

Authors

Table of Contents

Chapter 1: Hello, Data!
Loading text files of a reasonable size
Benchmarking text file parsers
Loading a subset of text files
Loading data from databases
Importing data from other statistical systems
Loading Excel spreadsheets
Summary
Chapter 2: Getting Data from the Web
Loading datasets from the Internet
Other popular online data formats
Reading data from HTML tables
Scraping data from other online sources
R packages to interact with data source APIs
Summary
Chapter 3: Filtering and Summarizing Data
Drop needless data
Aggregation
Running benchmarks
Summary functions
Summary
Chapter 4: Restructuring Data
Transposing matrices
Filtering data by string matching
Rearranging data
dplyr versus data.table
Computing new variables
Merging datasets
Reshaping data in a flexible way
The evolution of the reshape packages
Summary
Chapter 5: Building Models (authored by Renata Nemeth and Gergely Toth)
The motivation behind multivariate models
Linear regression with continuous predictors
Model assumptions
How well does the line fit in the data?
Discrete predictors
Summary
Chapter 6: Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth)
The modeling workflow
Logistic regression
Models for count data
Summary
Chapter 7: Unstructured Data
Importing the corpus
Cleaning the corpus
Visualizing the most frequent words in the corpus
Further cleanup
Analyzing the associations among terms
Some other metrics
The segmentation of documents
Summary
Chapter 8: Polishing Data
The types and origins of missing data
Identifying missing data
By-passing missing values
Getting rid of missing data
Filtering missing data before or during the actual analysis
Data imputation
Extreme values and outliers
Using robust methods
Summary
Chapter 9: From Big to Small Data
Adequacy tests
Principal Component Analysis
Factor analysis
Principal Component Analysis versus Factor Analysis
Multidimensional Scaling
Summary
Chapter 10: Classification and Clustering
Cluster analysis
Latent class models
Discriminant analysis
Logistic regression
Machine learning algorithms
Summary
Chapter 11: Social Network Analysis of the R Ecosystem
Loading network data
Centrality measures of networks
Visualizing network data
Further network analysis resources
Summary
Chapter 12: Analyzing Time-series
Creating time-series objects
Visualizing time-series
Seasonal decomposition
Holt-Winters filtering
Autoregressive Integrated Moving Average models
Outlier detection
More complex time-series objects
Advanced time-series analysis
Summary
Chapter 13: Data Around Us
Geocoding
Visualizing point data in space
Finding polygon overlays of point data
Plotting thematic maps
Rendering polygons around points
Satellite maps
Interactive maps
Alternative map designs
Spatial statistics
Summary
Chapter 14: Analyzing the R Community
R Foundation members
R package maintainers
The R-help mailing list
Analyzing overlaps between our lists of R users
The number of R users in social media
R-related posts in social media
Summary

Book Details

ISBN 139781783982028
Paperback396 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Machine Learning with R Book Cover
Machine Learning with R
$ 32.99
$ 23.10
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Learning Bayesian Models with R Book Cover
Learning Bayesian Models with R
$ 35.99
$ 25.20
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
R for Data Science Book Cover
R for Data Science
$ 29.99
$ 21.00
Mastering Predictive Analytics with R Book Cover
Mastering Predictive Analytics with R
$ 39.99
$ 28.00