Practical Data Science Cookbook - Second Edition

Over 85 recipes to help you complete real-world data science projects in R and Python

Practical Data Science Cookbook - Second Edition

Prabhanjan Tattar et al.

Over 85 recipes to help you complete real-world data science projects in R and Python
Mapt Subscription
FREE
$29.99/m after trial
eBook
$18.00
RRP $35.99
Save 49%
Print + eBook
$44.99
RRP $44.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$18.00
$44.99
$29.99p/m after trial
RRP $35.99
RRP $44.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781787129627
Paperback434 pages

Book Description

As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don’t. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use.

Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python.

Table of Contents

Chapter 1: Preparing Your Data Science Environment
Understanding the data science pipeline
Installing R on Windows, Mac OS X, and Linux
Installing libraries in R and RStudio
Installing Python on Linux and Mac OS X
Installing Python on Windows
Installing the Python data stack on Mac OS X and Linux
Installing extra Python packages
Installing and using virtualenv
Chapter 2: Driving Visual Analysis with Automobile Data with R
Introduction
Acquiring automobile fuel efficiency data
Preparing R for your first project
Importing automobile fuel efficiency data into R
Exploring and describing fuel efficiency data
Analyzing automobile fuel efficiency over time
Investigating the makes and models of automobiles
Chapter 3: Creating Application-Oriented Analyses Using Tax Data and Python
Introduction
Preparing for the analysis of top incomes
Importing and exploring the world's top incomes dataset
Analyzing and visualizing the top income data of the US
Furthering the analysis of the top income groups of the US
Reporting with Jinja2
Repeating the analysis in R
Chapter 4: Modeling Stock Market Data
Introduction
Acquiring stock market data
Summarizing the data
Cleaning and exploring the data
Generating relative valuations
Screening stocks and analyzing historical prices
Chapter 5: Visually Exploring Employment Data
Introduction
Preparing for analysis
Importing employment data into R
Exploring the employment data
Obtaining and merging additional data
Adding geographical information
Extracting state- and county-level wage and employment information
Visualizing geographical distributions of pay
Exploring where the jobs are, by industry
Animating maps for a geospatial time series
Benchmarking performance for some common tasks
Chapter 6: Driving Visual Analyses with Automobile Data
Introduction
Getting started with IPython
Exploring Jupyter Notebook
Preparing to analyze automobile fuel efficiencies
Exploring and describing fuel efficiency data with Python
Analyzing automobile fuel efficiency over time with Python
Investigating the makes and models of automobiles with Python
Chapter 7: Working with Social Graphs
Introduction
Preparing to work with social networks in Python
Importing networks
Exploring subgraphs within a heroic network
Finding strong ties
Finding key players
Exploring the characteristics of entire networks
Clustering and community detection in social networks
Visualizing graphs
Social networks in R
Chapter 8: Recommending Movies at Scale (Python)
Introduction
Modeling preference expressions
Understanding the data
Ingesting the movie review data
Finding the highest-scoring movies
Improving the movie-rating system
Measuring the distance between users in the preference space
Computing the correlation between users
Finding the best critic for a user
Predicting movie ratings for users
Collaboratively filtering item by item
Building a non-negative matrix factorization model
Loading the entire dataset into the memory
Dumping the SVD-based model to the disk
Training the SVD-based model
Testing the SVD-based model
Chapter 9: Harvesting and Geolocating Twitter Data (Python)
Introduction
Creating a Twitter application
Understanding the Twitter API v1.1
Determining your Twitter followers and friends
Pulling Twitter user profiles
Making requests without running afoul of Twitter's rate limits
Storing JSON data to disk
Setting up MongoDB for storing Twitter data
Storing user profiles in MongoDB using PyMongo
Exploring the geographic information available in profiles
Plotting geospatial data in Python
Chapter 10: Forecasting New Zealand Overseas Visitors
Introduction
The ts object
Visualizing time series data
Simple linear regression models
ACF and PACF
ARIMA models
Accuracy measurements
Fitting seasonal ARIMA models
Chapter 11: German Credit Data Analysis
Introduction
Simple data transformations
Visualizing categorical data
Discriminant analysis
Dividing the data and the ROC
Fitting the logistic regression model
Decision trees and rules
Decision tree for german data

What You Will Learn

  • Learn and understand the installation procedure and environment required for R and Python on various platforms
  • Prepare data for analysis by implement various data science concepts such as acquisition, cleaning and munging through R and Python
  • Build a predictive model and an exploratory model
  • Analyze the results of your model and create reports on the acquired data
  • Build various tree-based methods and Build random forest

Authors

Table of Contents

Chapter 1: Preparing Your Data Science Environment
Understanding the data science pipeline
Installing R on Windows, Mac OS X, and Linux
Installing libraries in R and RStudio
Installing Python on Linux and Mac OS X
Installing Python on Windows
Installing the Python data stack on Mac OS X and Linux
Installing extra Python packages
Installing and using virtualenv
Chapter 2: Driving Visual Analysis with Automobile Data with R
Introduction
Acquiring automobile fuel efficiency data
Preparing R for your first project
Importing automobile fuel efficiency data into R
Exploring and describing fuel efficiency data
Analyzing automobile fuel efficiency over time
Investigating the makes and models of automobiles
Chapter 3: Creating Application-Oriented Analyses Using Tax Data and Python
Introduction
Preparing for the analysis of top incomes
Importing and exploring the world's top incomes dataset
Analyzing and visualizing the top income data of the US
Furthering the analysis of the top income groups of the US
Reporting with Jinja2
Repeating the analysis in R
Chapter 4: Modeling Stock Market Data
Introduction
Acquiring stock market data
Summarizing the data
Cleaning and exploring the data
Generating relative valuations
Screening stocks and analyzing historical prices
Chapter 5: Visually Exploring Employment Data
Introduction
Preparing for analysis
Importing employment data into R
Exploring the employment data
Obtaining and merging additional data
Adding geographical information
Extracting state- and county-level wage and employment information
Visualizing geographical distributions of pay
Exploring where the jobs are, by industry
Animating maps for a geospatial time series
Benchmarking performance for some common tasks
Chapter 6: Driving Visual Analyses with Automobile Data
Introduction
Getting started with IPython
Exploring Jupyter Notebook
Preparing to analyze automobile fuel efficiencies
Exploring and describing fuel efficiency data with Python
Analyzing automobile fuel efficiency over time with Python
Investigating the makes and models of automobiles with Python
Chapter 7: Working with Social Graphs
Introduction
Preparing to work with social networks in Python
Importing networks
Exploring subgraphs within a heroic network
Finding strong ties
Finding key players
Exploring the characteristics of entire networks
Clustering and community detection in social networks
Visualizing graphs
Social networks in R
Chapter 8: Recommending Movies at Scale (Python)
Introduction
Modeling preference expressions
Understanding the data
Ingesting the movie review data
Finding the highest-scoring movies
Improving the movie-rating system
Measuring the distance between users in the preference space
Computing the correlation between users
Finding the best critic for a user
Predicting movie ratings for users
Collaboratively filtering item by item
Building a non-negative matrix factorization model
Loading the entire dataset into the memory
Dumping the SVD-based model to the disk
Training the SVD-based model
Testing the SVD-based model
Chapter 9: Harvesting and Geolocating Twitter Data (Python)
Introduction
Creating a Twitter application
Understanding the Twitter API v1.1
Determining your Twitter followers and friends
Pulling Twitter user profiles
Making requests without running afoul of Twitter's rate limits
Storing JSON data to disk
Setting up MongoDB for storing Twitter data
Storing user profiles in MongoDB using PyMongo
Exploring the geographic information available in profiles
Plotting geospatial data in Python
Chapter 10: Forecasting New Zealand Overseas Visitors
Introduction
The ts object
Visualizing time series data
Simple linear regression models
ACF and PACF
ARIMA models
Accuracy measurements
Fitting seasonal ARIMA models
Chapter 11: German Credit Data Analysis
Introduction
Simple data transformations
Visualizing categorical data
Discriminant analysis
Dividing the data and the ROC
Fitting the logistic regression model
Decision trees and rules
Decision tree for german data

Book Details

ISBN 139781787129627
Paperback434 pages
Read More

Read More Reviews

Recommended for You

Big Data Analytics with R and Hadoop Book Cover
Big Data Analytics with R and Hadoop
$ 29.99
$ 21.00
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Python Machine Learning Blueprints: Intuitive data projects you can relate to Book Cover
Python Machine Learning Blueprints: Intuitive data projects you can relate to
$ 39.99
$ 28.00
Python Data Analysis Book Cover
Python Data Analysis
$ 29.99
$ 21.00
Learning Predictive Analytics with Python Book Cover
Learning Predictive Analytics with Python
$ 39.99
$ 28.00