Machine Learning with Spark

Create scalable machine learning applications to power a modern data-driven business using Spark

Machine Learning with Spark

Nick Pentreath

1 customer reviews
Create scalable machine learning applications to power a modern data-driven business using Spark
Mapt Subscription
FREE
$29.99/m after trial
eBook
$3.00
RRP $29.99
Save 89%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$3.00
$49.99
$29.99p/m after trial
RRP $29.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781783288519
Paperback338 pages

Book Description

Apache Spark is a framework for distributed computing that is designed from the ground up to be optimized for low latency tasks and in-memory data storage. It is one of the few frameworks for parallel computing that combines speed, scalability, in-memory processing, and fault tolerance with ease of programming and a flexible, expressive, and powerful API design.

This book guides you through the basics of Spark's API used to load and process data and prepare the data to use as input to the various machine learning models. There are detailed examples and real-world use cases for you to explore common machine learning models including recommender systems, classification, regression, clustering, and dimensionality reduction. You will cover advanced topics such as working with large-scale text data, and methods for online machine learning and model evaluation using Spark Streaming.

Table of Contents

Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
The first step to a Spark program in Scala
The first step to a Spark program in Java
The first step to a Spark program in Python
Getting Spark running on Amazon EC2
Summary
Chapter 2: Designing a Machine Learning System
Introducing MovieStream
Business use cases for a machine learning system
Types of machine learning models
The components of a data-driven machine learning system
An architecture for a machine learning system
Summary
Chapter 3: Obtaining, Processing, and Preparing Data with Spark
Accessing publicly available datasets
Exploring and visualizing your data
Processing and transforming your data
Extracting useful features from your data
Summary
Chapter 4: Building a Recommendation Engine with Spark
Types of recommendation models
Extracting the right features from your data
Training the recommendation model
Using the recommendation model
Evaluating the performance of recommendation models
Summary
Chapter 5: Building a Classification Model with Spark
Types of classification models
Extracting the right features from your data
Training classification models
Using classification models
Evaluating the performance of classification models
Improving model performance and tuning parameters
Summary
Chapter 6: Building a Regression Model with Spark
Types of regression models
Extracting the right features from your data
Training and using regression models
Evaluating the performance of regression models
Improving model performance and tuning parameters
Summary
Chapter 7: Building a Clustering Model with Spark
Types of clustering models
Extracting the right features from your data
Training a clustering model
Making predictions using a clustering model
Evaluating the performance of clustering models
Tuning parameters for clustering models
Summary
Chapter 8: Dimensionality Reduction with Spark
Types of dimensionality reduction
Extracting the right features from your data
Training a dimensionality reduction model
Using a dimensionality reduction model
Evaluating dimensionality reduction models
Summary
Chapter 9: Advanced Text Processing with Spark
What's so special about text data?
Extracting the right features from your data
Using a TF-IDF model
Evaluating the impact of text processing
Word2Vec models
Summary
Chapter 10: Real-time Machine Learning with Spark Streaming
Online learning
Stream processing
Creating a Spark Streaming application
Online learning with Spark Streaming
Online model evaluation
Summary

What You Will Learn

  • Create your first Spark program in Scala, Java, and Python
  • Set up and configure a development environment for Spark on your own computer, as well as on Amazon EC2
  • Access public machine learning datasets and use Spark to load, process, clean, and transform data
  • Use Spark's machine learning library to implement programs utilizing well-known machine learning models including collaborative filtering, classification, regression, clustering, and dimensionality reduction
  • Write Spark functions to evaluate the performance of your machine learning models
  • Deal with large-scale text data, including feature extraction and using text data as input to your machine learning models
  • Explore online learning methods and use Spark Streaming for online learning and model evaluation

Authors

Table of Contents

Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
The first step to a Spark program in Scala
The first step to a Spark program in Java
The first step to a Spark program in Python
Getting Spark running on Amazon EC2
Summary
Chapter 2: Designing a Machine Learning System
Introducing MovieStream
Business use cases for a machine learning system
Types of machine learning models
The components of a data-driven machine learning system
An architecture for a machine learning system
Summary
Chapter 3: Obtaining, Processing, and Preparing Data with Spark
Accessing publicly available datasets
Exploring and visualizing your data
Processing and transforming your data
Extracting useful features from your data
Summary
Chapter 4: Building a Recommendation Engine with Spark
Types of recommendation models
Extracting the right features from your data
Training the recommendation model
Using the recommendation model
Evaluating the performance of recommendation models
Summary
Chapter 5: Building a Classification Model with Spark
Types of classification models
Extracting the right features from your data
Training classification models
Using classification models
Evaluating the performance of classification models
Improving model performance and tuning parameters
Summary
Chapter 6: Building a Regression Model with Spark
Types of regression models
Extracting the right features from your data
Training and using regression models
Evaluating the performance of regression models
Improving model performance and tuning parameters
Summary
Chapter 7: Building a Clustering Model with Spark
Types of clustering models
Extracting the right features from your data
Training a clustering model
Making predictions using a clustering model
Evaluating the performance of clustering models
Tuning parameters for clustering models
Summary
Chapter 8: Dimensionality Reduction with Spark
Types of dimensionality reduction
Extracting the right features from your data
Training a dimensionality reduction model
Using a dimensionality reduction model
Evaluating dimensionality reduction models
Summary
Chapter 9: Advanced Text Processing with Spark
What's so special about text data?
Extracting the right features from your data
Using a TF-IDF model
Evaluating the impact of text processing
Word2Vec models
Summary
Chapter 10: Real-time Machine Learning with Spark Streaming
Online learning
Stream processing
Creating a Spark Streaming application
Online learning with Spark Streaming
Online model evaluation
Summary

Book Details

ISBN 139781783288519
Paperback338 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Machine Learning with R Book Cover
Machine Learning with R
$ 32.99
$ 23.10
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Scala for Machine Learning Book Cover
Scala for Machine Learning
$ 35.99
$ 25.20
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00