Machine Learning with Spark - Second Edition

Create scalable machine learning applications to power a modern data-driven business using Spark 2.x
Preview in Mapt

Machine Learning with Spark - Second Edition

Rajdeep Dua, Manpreet Singh Ghotra, Nick Pentreath

2 customer reviews
Create scalable machine learning applications to power a modern data-driven business using Spark 2.x
Mapt Subscription
FREE
$29.99/m after trial
eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$28.00
$49.99
$29.99p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Machine Learning with Spark - Second Edition Book Cover
Machine Learning with Spark - Second Edition
$ 39.99
$ 28.00
Mastering Machine Learning with scikit-learn - Second Edition Book Cover
Mastering Machine Learning with scikit-learn - Second Edition
$ 35.99
$ 25.20
Buy 2 for $35.00
Save $40.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781785889936
Paperback532 pages

Book Description

This book will teach you about popular machine learning algorithms and their implementation. You will learn how various machine learning concepts are implemented in the context of Spark ML. You will start by installing Spark in a single and multinode cluster. Next you'll see how to execute Scala and Python based programs for Spark ML. Then we will take a few datasets and go deeper into clustering, classification, and regression. Toward the end, we will also cover text processing using Spark ML.

Once you have learned the concepts, they can be applied to implement algorithms in either green-field implementations or to migrate existing systems to this new platform. You can migrate from Mahout or Scikit to use Spark ML.

By the end of this book, you will acquire the skills to leverage Spark's features to create your own scalable machine learning applications and power a modern data-driven business.

Table of Contents

Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
SchemaRDD
Spark data frame
The first step to a Spark program in Scala
The first step to a Spark program in Java
The first step to a Spark program in Python
The first step to a Spark program in R
Getting Spark running on Amazon EC2
Configuring and running Spark on Amazon Elastic Map Reduce
UI in Spark
Supported machine learning algorithms by Spark
Benefits of using Spark ML as compared to existing libraries
Spark Cluster on Google Compute Engine - DataProc
Summary
Chapter 2: Math for Machine Learning
Linear algebra
Gradient descent
Prior, likelihood, and posterior
Calculus
Plotting
Summary
Chapter 3: Designing a Machine Learning System
What is Machine Learning?
Introducing MovieStream
Business use cases for a machine learning system
Types of machine learning models
The components of a data-driven machine learning system
An architecture for a machine learning system
Spark MLlib
Performance improvements in Spark ML over Spark MLlib
Comparing algorithms supported by MLlib
MLlib supported methods and developer APIs
MLlib vision
MLlib versions compared
Summary
Chapter 4: Obtaining, Processing, and Preparing Data with Spark
Accessing publicly available datasets
Exploring and visualizing your data
Processing and transforming your data
Extracting useful features from your data
Summary
Chapter 5: Building a Recommendation Engine with Spark
Types of recommendation models
Extracting the right features from your data
Training the recommendation model
Using the recommendation model
Evaluating the performance of recommendation models
FP-Growth algorithm
Summary
Chapter 6: Building a Classification Model with Spark
Types of classification models
Extracting the right features from your data
Training classification models
Using classification models
Improving model performance and tuning parameters
Additional features
Summary
Chapter 7: Building a Regression Model with Spark
Types of regression models
Evaluating the performance of regression models
Extracting the right features from your data
Training and using regression models
Improving model performance and tuning parameters
Summary
Chapter 8: Building a Clustering Model with Spark
Types of clustering models
Extracting the right features from your data
K-means - training a clustering model
K-means - evaluating the performance of clustering models
Effect of iterations on WSSSE
Bisecting KMeans
Bisecting K-means - training a clustering model
Gaussian Mixture Model
Summary
Chapter 9: Dimensionality Reduction with Spark
Types of dimensionality reduction
Extracting the right features from your data
Training a dimensionality reduction model
Using a dimensionality reduction model
Evaluating dimensionality reduction models
Summary
Chapter 10: Advanced Text Processing with Spark
What's so special about text data?
Extracting the right features from your data
Using a tf-idf model
Evaluating the impact of text processing
Text classification with Spark 2.0
Word2Vec models
Word2Vec with Spark ML on the 20 Newsgroups dataset
Summary
Chapter 11: Real-Time Machine Learning with Spark Streaming
Online learning
Stream processing
Online learning with Spark Streaming
Online model evaluation
Structured Streaming
Summary
Chapter 12: Pipeline APIs for Spark ML
Introduction to pipelines
How pipelines work
Machine learning pipeline with an example
Summary

What You Will Learn

  • Get hands-on with the latest version of Spark ML
  • Create your first Spark program with Scala and Python
  • Set up and configure a development environment for Spark on your own computer, as well as on Amazon EC2
  • Access public machine learning datasets and use Spark to load, process, clean, and transform data
  • Use Spark's machine learning library to implement programs by utilizing well-known machine learning models
  • Deal with large-scale text data, including feature extraction and using text data as input to your machine learning models
  • Write Spark functions to evaluate the performance of your machine learning models

Authors

Table of Contents

Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
SchemaRDD
Spark data frame
The first step to a Spark program in Scala
The first step to a Spark program in Java
The first step to a Spark program in Python
The first step to a Spark program in R
Getting Spark running on Amazon EC2
Configuring and running Spark on Amazon Elastic Map Reduce
UI in Spark
Supported machine learning algorithms by Spark
Benefits of using Spark ML as compared to existing libraries
Spark Cluster on Google Compute Engine - DataProc
Summary
Chapter 2: Math for Machine Learning
Linear algebra
Gradient descent
Prior, likelihood, and posterior
Calculus
Plotting
Summary
Chapter 3: Designing a Machine Learning System
What is Machine Learning?
Introducing MovieStream
Business use cases for a machine learning system
Types of machine learning models
The components of a data-driven machine learning system
An architecture for a machine learning system
Spark MLlib
Performance improvements in Spark ML over Spark MLlib
Comparing algorithms supported by MLlib
MLlib supported methods and developer APIs
MLlib vision
MLlib versions compared
Summary
Chapter 4: Obtaining, Processing, and Preparing Data with Spark
Accessing publicly available datasets
Exploring and visualizing your data
Processing and transforming your data
Extracting useful features from your data
Summary
Chapter 5: Building a Recommendation Engine with Spark
Types of recommendation models
Extracting the right features from your data
Training the recommendation model
Using the recommendation model
Evaluating the performance of recommendation models
FP-Growth algorithm
Summary
Chapter 6: Building a Classification Model with Spark
Types of classification models
Extracting the right features from your data
Training classification models
Using classification models
Improving model performance and tuning parameters
Additional features
Summary
Chapter 7: Building a Regression Model with Spark
Types of regression models
Evaluating the performance of regression models
Extracting the right features from your data
Training and using regression models
Improving model performance and tuning parameters
Summary
Chapter 8: Building a Clustering Model with Spark
Types of clustering models
Extracting the right features from your data
K-means - training a clustering model
K-means - evaluating the performance of clustering models
Effect of iterations on WSSSE
Bisecting KMeans
Bisecting K-means - training a clustering model
Gaussian Mixture Model
Summary
Chapter 9: Dimensionality Reduction with Spark
Types of dimensionality reduction
Extracting the right features from your data
Training a dimensionality reduction model
Using a dimensionality reduction model
Evaluating dimensionality reduction models
Summary
Chapter 10: Advanced Text Processing with Spark
What's so special about text data?
Extracting the right features from your data
Using a tf-idf model
Evaluating the impact of text processing
Text classification with Spark 2.0
Word2Vec models
Word2Vec with Spark ML on the 20 Newsgroups dataset
Summary
Chapter 11: Real-Time Machine Learning with Spark Streaming
Online learning
Stream processing
Online learning with Spark Streaming
Online model evaluation
Structured Streaming
Summary
Chapter 12: Pipeline APIs for Spark ML
Introduction to pipelines
How pipelines work
Machine learning pipeline with an example
Summary

Book Details

ISBN 139781785889936
Paperback532 pages
Read More
From 2 reviews

Read More Reviews

Recommended for You

Spark Cookbook Book Cover
Spark Cookbook
$ 35.99
$ 25.20
Scala for Data Science Book Cover
Scala for Data Science
$ 43.99
$ 30.80
Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Practical Machine Learning Book Cover
Practical Machine Learning
$ 37.99
$ 26.60
Advanced Machine Learning with Python Book Cover
Advanced Machine Learning with Python
$ 35.99
$ 25.20