Apache Spark 2.x Machine Learning Cookbook

Simplify machine learning model implementations with Spark
Preview in Mapt

Apache Spark 2.x Machine Learning Cookbook

Siamak Amirghodsi et al.

1 customer reviews
Simplify machine learning model implementations with Spark

Quick links: > What will you learn?> Table of content> Product reviews

eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$28.00
$49.99
RRP $39.99
RRP $49.99
eBook
Print + eBook

Frequently bought together


Apache Spark 2.x Machine Learning Cookbook Book Cover
Apache Spark 2.x Machine Learning Cookbook
$ 39.99
$ 28.00
Mastering Machine Learning with Spark 2.x Book Cover
Mastering Machine Learning with Spark 2.x
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $44.98
Add to Cart

Book Details

ISBN 139781783551606
Paperback666 pages

Book Description

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning about algorithms enables a wide range of applications, from everyday tasks such as product recommendations and spam filtering to cutting edge applications such as self-driving cars and personalized medicine. You will gain hands-on experience of applying these principles using Apache Spark, a resilient cluster computing system well suited for large-scale machine learning tasks.

This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we’ll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems.

Table of Contents

Chapter 1: Practical Machine Learning with Spark Using Scala
Introduction
Downloading and installing the JDK
Downloading and installing IntelliJ
Downloading and installing Spark
Configuring IntelliJ to work with Spark and run Spark ML sample codes
Running a sample ML code from Spark
Identifying data sources for practical machine learning
Running your first program using Apache Spark 2.0 with the IntelliJ IDE
How to add graphics to your Spark program
Chapter 2: Just Enough Linear Algebra for Machine Learning with Spark
Introduction
Package imports and initial setup for vectors and matrices
Creating DenseVector and setup with Spark 2.0
Creating SparseVector and setup with Spark
Creating dense matrix and setup with Spark 2.0
Using sparse local matrices with Spark 2.0
Performing vector arithmetic using Spark 2.0
Performing matrix arithmetic using Spark 2.0
Exploring RowMatrix in Spark 2.0
Exploring Distributed IndexedRowMatrix in Spark 2.0
Exploring distributed CoordinateMatrix in Spark 2.0
Exploring distributed BlockMatrix in Spark 2.0
Chapter 3: Spark's Three Data Musketeers for Machine Learning - Perfect Together
Introduction
Creating RDDs with Spark 2.0 using internal data sources
Creating RDDs with Spark 2.0 using external data sources
Transforming RDDs with Spark 2.0 using the filter() API
Transforming RDDs with the super useful flatMap() API
Transforming RDDs with set operation APIs
RDD transformation/aggregation with groupBy() and reduceByKey()
Transforming RDDs with the zip() API
Join transformation with paired key-value RDDs
Reduce and grouping transformation with paired key-value RDDs
Creating DataFrames from Scala data structures
Operating on DataFrames programmatically without SQL
Loading DataFrames and setup from an external source
Using DataFrames with standard SQL language - SparkSQL
Working with the Dataset API using a Scala Sequence
Creating and using Datasets from RDDs and back again
Working with JSON using the Dataset API and SQL together
Functional programming with the Dataset API using domain objects
Chapter 4: Common Recipes for Implementing a Robust Machine Learning System
Introduction
Spark's basic statistical API to help you build your own algorithms
ML pipelines for real-life machine learning applications
Normalizing data with Spark
Splitting data for training and testing
Common operations with the new Dataset API
Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0
LabeledPoint data structure for Spark ML
Getting access to Spark cluster in Spark 2.0
Getting access to Spark cluster pre-Spark 2.0
Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0
New model export and PMML markup in Spark 2.0
Regression model evaluation using Spark 2.0
Binary classification model evaluation using Spark 2.0
Multiclass classification model evaluation using Spark 2.0
Multilabel classification model evaluation using Spark 2.0
Using the Scala Breeze library to do graphics in Spark 2.0
Chapter 5: Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I
Introduction
Fitting a linear regression line to data the old fashioned way
Generalized linear regression in Spark 2.0
Linear regression API with Lasso and L-BFGS in Spark 2.0
Linear regression API with Lasso and 'auto' optimization selection in Spark 2.0
Linear regression API with ridge regression and 'auto' optimization selection in Spark 2.0
Isotonic regression in Apache Spark 2.0
Multilayer perceptron classifier in Apache Spark 2.0
One-vs-Rest classifier (One-vs-All) in Apache Spark 2.0
Survival regression – parametric AFT model in Apache Spark 2.0
Chapter 6: Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II
Introduction
Linear regression with SGD optimization in Spark 2.0
Logistic regression with SGD optimization in Spark 2.0
Ridge regression with SGD optimization in Spark 2.0
Lasso regression with SGD optimization in Spark 2.0
Logistic regression with L-BFGS optimization in Spark 2.0
Support Vector Machine (SVM) with Spark 2.0
Naive Bayes machine learning with Spark 2.0 MLlib
Exploring ML pipelines and DataFrames using logistic regression in Spark 2.0
Chapter 7: Recommendation Engine that Scales with Spark
Introduction
Setting up the required data for a scalable recommendation engine in Spark 2.0
Exploring the movies data details for the recommendation system in Spark 2.0
Exploring the ratings data details for the recommendation system in Spark 2.0
Building a scalable recommendation engine using collaborative filtering in Spark 2.0
Chapter 8: Unsupervised Clustering with Apache Spark 2.0
Introduction
Building a KMeans classifying system in Spark 2.0
Bisecting KMeans, the new kid on the block in Spark 2.0
Using Gaussian Mixture and Expectation Maximization (EM) in Spark to classify data
Classifying the vertices of a graph using Power Iteration Clustering (PIC) in Spark 2.0
Latent Dirichlet Allocation (LDA) to classify documents and text into topics
Streaming KMeans to classify data in near real-time
Chapter 9: Optimization - Going Down the Hill with Gradient Descent
Introduction
Optimizing a quadratic cost function and finding the minima using just math to gain insight
Coding a quadratic cost function optimization using Gradient Descent (GD) from scratch
Coding Gradient Descent optimization to solve Linear Regression from scratch
Normal equations as an alternative for solving Linear Regression in Spark 2.0
Chapter 10: Building Machine Learning Systems with Decision Tree and Ensemble Models
Introduction
Getting and preparing real-world medical data for exploring Decision Trees and Ensemble models in Spark 2.0
Building a classification system with Decision Trees in Spark 2.0
Solving Regression problems with Decision Trees in Spark 2.0
Building a classification system with Random Forest Trees in Spark 2.0
Solving regression problems with Random Forest Trees in Spark 2.0
Building a classification system with Gradient Boosted Trees (GBT) in Spark 2.0
Solving regression problems with Gradient Boosted Trees (GBT) in Spark 2.0
Chapter 11: Curse of High-Dimensionality in Big Data
Introduction
Two methods of ingesting and preparing a CSV file for processing in Spark
Singular Value Decomposition (SVD) to reduce high-dimensionality in Spark
Principal Component Analysis (PCA) to pick the most effective latent factor for machine learning in Spark
Chapter 12: Implementing Text Analytics with Spark 2.0 ML Library
Introduction
Doing term frequency with Spark - everything that counts
Displaying similar words with Spark using Word2Vec
Downloading a complete dump of Wikipedia for a real-life Spark ML project
Using Latent Semantic Analysis for text analytics with Spark 2.0
Topic modeling with Latent Dirichlet allocation in Spark 2.0
Chapter 13: Spark Streaming and Machine Learning Library
Introduction
Structured streaming for near real-time machine learning
Streaming DataFrames for real-time machine learning
Streaming Datasets for real-time machine learning
Streaming data and debugging with queueStream
Downloading and understanding the famous Iris data for unsupervised classification
Streaming KMeans for a real-time on-line classifier
Downloading wine quality data for streaming regression
Streaming linear regression for a real-time regression
Downloading Pima Diabetes data for supervised classification
Streaming logistic regression for an on-line classifier

What You Will Learn

  • Get to know how Scala and Spark go hand-in-hand for developers when developing ML systems with Spark
  • Build a recommendation engine that scales with Spark
  • Find out how to build unsupervised clustering systems to classify data in Spark
  • Build machine learning systems with the Decision Tree and Ensemble models in Spark
  • Deal with the curse of high-dimensionality in big data using Spark
  • Implement Text analytics for Search Engines in Spark
  • Streaming Machine Learning System implementation using Spark

Authors

Table of Contents

Chapter 1: Practical Machine Learning with Spark Using Scala
Introduction
Downloading and installing the JDK
Downloading and installing IntelliJ
Downloading and installing Spark
Configuring IntelliJ to work with Spark and run Spark ML sample codes
Running a sample ML code from Spark
Identifying data sources for practical machine learning
Running your first program using Apache Spark 2.0 with the IntelliJ IDE
How to add graphics to your Spark program
Chapter 2: Just Enough Linear Algebra for Machine Learning with Spark
Introduction
Package imports and initial setup for vectors and matrices
Creating DenseVector and setup with Spark 2.0
Creating SparseVector and setup with Spark
Creating dense matrix and setup with Spark 2.0
Using sparse local matrices with Spark 2.0
Performing vector arithmetic using Spark 2.0
Performing matrix arithmetic using Spark 2.0
Exploring RowMatrix in Spark 2.0
Exploring Distributed IndexedRowMatrix in Spark 2.0
Exploring distributed CoordinateMatrix in Spark 2.0
Exploring distributed BlockMatrix in Spark 2.0
Chapter 3: Spark's Three Data Musketeers for Machine Learning - Perfect Together
Introduction
Creating RDDs with Spark 2.0 using internal data sources
Creating RDDs with Spark 2.0 using external data sources
Transforming RDDs with Spark 2.0 using the filter() API
Transforming RDDs with the super useful flatMap() API
Transforming RDDs with set operation APIs
RDD transformation/aggregation with groupBy() and reduceByKey()
Transforming RDDs with the zip() API
Join transformation with paired key-value RDDs
Reduce and grouping transformation with paired key-value RDDs
Creating DataFrames from Scala data structures
Operating on DataFrames programmatically without SQL
Loading DataFrames and setup from an external source
Using DataFrames with standard SQL language - SparkSQL
Working with the Dataset API using a Scala Sequence
Creating and using Datasets from RDDs and back again
Working with JSON using the Dataset API and SQL together
Functional programming with the Dataset API using domain objects
Chapter 4: Common Recipes for Implementing a Robust Machine Learning System
Introduction
Spark's basic statistical API to help you build your own algorithms
ML pipelines for real-life machine learning applications
Normalizing data with Spark
Splitting data for training and testing
Common operations with the new Dataset API
Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0
LabeledPoint data structure for Spark ML
Getting access to Spark cluster in Spark 2.0
Getting access to Spark cluster pre-Spark 2.0
Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0
New model export and PMML markup in Spark 2.0
Regression model evaluation using Spark 2.0
Binary classification model evaluation using Spark 2.0
Multiclass classification model evaluation using Spark 2.0
Multilabel classification model evaluation using Spark 2.0
Using the Scala Breeze library to do graphics in Spark 2.0
Chapter 5: Practical Machine Learning with Regression and Classification in Spark 2.0 - Part I
Introduction
Fitting a linear regression line to data the old fashioned way
Generalized linear regression in Spark 2.0
Linear regression API with Lasso and L-BFGS in Spark 2.0
Linear regression API with Lasso and 'auto' optimization selection in Spark 2.0
Linear regression API with ridge regression and 'auto' optimization selection in Spark 2.0
Isotonic regression in Apache Spark 2.0
Multilayer perceptron classifier in Apache Spark 2.0
One-vs-Rest classifier (One-vs-All) in Apache Spark 2.0
Survival regression – parametric AFT model in Apache Spark 2.0
Chapter 6: Practical Machine Learning with Regression and Classification in Spark 2.0 - Part II
Introduction
Linear regression with SGD optimization in Spark 2.0
Logistic regression with SGD optimization in Spark 2.0
Ridge regression with SGD optimization in Spark 2.0
Lasso regression with SGD optimization in Spark 2.0
Logistic regression with L-BFGS optimization in Spark 2.0
Support Vector Machine (SVM) with Spark 2.0
Naive Bayes machine learning with Spark 2.0 MLlib
Exploring ML pipelines and DataFrames using logistic regression in Spark 2.0
Chapter 7: Recommendation Engine that Scales with Spark
Introduction
Setting up the required data for a scalable recommendation engine in Spark 2.0
Exploring the movies data details for the recommendation system in Spark 2.0
Exploring the ratings data details for the recommendation system in Spark 2.0
Building a scalable recommendation engine using collaborative filtering in Spark 2.0
Chapter 8: Unsupervised Clustering with Apache Spark 2.0
Introduction
Building a KMeans classifying system in Spark 2.0
Bisecting KMeans, the new kid on the block in Spark 2.0
Using Gaussian Mixture and Expectation Maximization (EM) in Spark to classify data
Classifying the vertices of a graph using Power Iteration Clustering (PIC) in Spark 2.0
Latent Dirichlet Allocation (LDA) to classify documents and text into topics
Streaming KMeans to classify data in near real-time
Chapter 9: Optimization - Going Down the Hill with Gradient Descent
Introduction
Optimizing a quadratic cost function and finding the minima using just math to gain insight
Coding a quadratic cost function optimization using Gradient Descent (GD) from scratch
Coding Gradient Descent optimization to solve Linear Regression from scratch
Normal equations as an alternative for solving Linear Regression in Spark 2.0
Chapter 10: Building Machine Learning Systems with Decision Tree and Ensemble Models
Introduction
Getting and preparing real-world medical data for exploring Decision Trees and Ensemble models in Spark 2.0
Building a classification system with Decision Trees in Spark 2.0
Solving Regression problems with Decision Trees in Spark 2.0
Building a classification system with Random Forest Trees in Spark 2.0
Solving regression problems with Random Forest Trees in Spark 2.0
Building a classification system with Gradient Boosted Trees (GBT) in Spark 2.0
Solving regression problems with Gradient Boosted Trees (GBT) in Spark 2.0
Chapter 11: Curse of High-Dimensionality in Big Data
Introduction
Two methods of ingesting and preparing a CSV file for processing in Spark
Singular Value Decomposition (SVD) to reduce high-dimensionality in Spark
Principal Component Analysis (PCA) to pick the most effective latent factor for machine learning in Spark
Chapter 12: Implementing Text Analytics with Spark 2.0 ML Library
Introduction
Doing term frequency with Spark - everything that counts
Displaying similar words with Spark using Word2Vec
Downloading a complete dump of Wikipedia for a real-life Spark ML project
Using Latent Semantic Analysis for text analytics with Spark 2.0
Topic modeling with Latent Dirichlet allocation in Spark 2.0
Chapter 13: Spark Streaming and Machine Learning Library
Introduction
Structured streaming for near real-time machine learning
Streaming DataFrames for real-time machine learning
Streaming Datasets for real-time machine learning
Streaming data and debugging with queueStream
Downloading and understanding the famous Iris data for unsupervised classification
Streaming KMeans for a real-time on-line classifier
Downloading wine quality data for streaming regression
Streaming linear regression for a real-time regression
Downloading Pima Diabetes data for supervised classification
Streaming logistic regression for an on-line classifier

Book Details

ISBN 139781783551606
Paperback666 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Mastering Machine Learning with Spark 2.x Book Cover
Mastering Machine Learning with Spark 2.x
$ 39.99
$ 28.00
Apache Spark 2.x Cookbook Book Cover
Apache Spark 2.x Cookbook
$ 39.99
$ 28.00
Learning PySpark Book Cover
Learning PySpark
$ 35.99
$ 25.20
Scala and Spark for Big Data Analytics Book Cover
Scala and Spark for Big Data Analytics
$ 51.99
$ 36.40
Machine Learning with Spark - Second Edition Book Cover
Machine Learning with Spark - Second Edition
$ 39.99
$ 28.00
TensorFlow Machine Learning Cookbook Book Cover
TensorFlow Machine Learning Cookbook
$ 43.99
$ 30.80