Large Scale Machine Learning with Spark

Discover everything you need to build robust machine learning applications with Spark 2.0
Preview in Mapt

Large Scale Machine Learning with Spark

Md. Rezaul Karim, Md. Mahedi Kaysar

Discover everything you need to build robust machine learning applications with Spark 2.0
Mapt Subscription
FREE
$29.99/m after trial
eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$28.00
$49.99
$29.99p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Large Scale Machine Learning with Spark Book Cover
Large Scale Machine Learning with Spark
$ 39.99
$ 28.00
Large Scale Machine Learning with Python Book Cover
Large Scale Machine Learning with Python
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $44.98
Add to Cart
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 

Book Details

ISBN 139781785888748
Paperback476 pages

Book Description

Data processing, implementing related algorithms, tuning, scaling up and finally deploying are some crucial steps in the process of optimising any application.

Spark is capable of handling large-scale batch and streaming data to figure out when to cache data in memory and processing them up to 100 times faster than Hadoop-based MapReduce.This means predictive analytics can be applied to streaming and batch to develop complete machine learning (ML) applications a lot quicker, making Spark an ideal candidate for large data-intensive applications.

This book focuses on design engineering and scalable solutions using ML with Spark. First, you will learn how to install Spark with all new features from the latest Spark 2.0 release. Moving on, you’ll explore important concepts such as advanced feature engineering with RDD and Datasets. After studying developing and deploying applications, you will see how to use external libraries with Spark.

In summary, you will be able to develop complete and personalised ML applications from data collections,model building, tuning, and scaling up to deploying on a cluster or the cloud.

Table of Contents

Chapter 1: Introduction to Data Analytics with Spark
Spark overview
New computing paradigm with Spark
Spark ecosystem
Spark machine learning libraries
Installing and getting started with Spark
Packaging your application with dependencies
Running a sample machine learning application
References
Summary
Chapter 2: Machine Learning Best Practices
What is machine learning?
Machine learning tasks
Practical machine learning problems
Most widely used machine learning problems
Large scale machine learning APIs in Spark
Practical machine learning best practices
Choosing the right algorithm for your application
Summary
Chapter 3: Understanding the Problem by Understanding the Data
Analyzing and preparing your data
Resilient Distributed Dataset basics
Dataset basics
Dataset from string and typed class
Spark and data scientists workflow
Deeper into Spark
Summary
Chapter 4: Extracting Knowledge through Feature Engineering
The state of the art of feature engineering
Best practices in feature engineering
Feature engineering with Spark
Advanced feature engineering
Summary
Chapter 5: Supervised and Unsupervised Learning by Examples
Machine learning classes
Supervised learning with Spark - an example
Unsupervised learning
Recommender system
Advanced learning and generalizations
Summary
Chapter 6: Building Scalable Machine Learning Pipelines
Spark machine learning pipeline APIs
Cancer-diagnosis pipeline with Spark
Cancer-prognosis pipeline with Spark
Market basket analysis with Spark Core
OCR pipeline with Spark
Topic modeling using Spark MLlib and ML
Credit risk analysis pipeline with Spark
Scaling the ML pipelines
Tips and performance considerations
Summary
Chapter 7: Tuning Machine Learning Models
Details about machine learning model tuning
Typical challenges in model tuning
Evaluating machine learning models
Validation and evaluation techniques
Parameter tuning for machine learning models
Hypothesis testing
Machine learning model selection
Summary
Chapter 8: Adapting Your Machine Learning Models
Adapting machine learning models
The generalization of ML models
Adapting through incremental algorithms
Adapting through reusing ML models
Machine learning in dynamic environments
Summary
Chapter 9: Advanced Machine Learning with Streaming and Graph Data
Developing real-time ML pipelines
Time series and social network analysis
Movie recommendation using Spark
Developing a real-time ML pipeline from streaming
ML pipeline on graph data and semi-supervised graph-based learning
Summary
Chapter 10: Configuring and Working with External Libraries
Third-party ML libraries with Spark
Using external libraries with Spark Core
Time series analysis using the Cloudera Spark-TS package
Configuring SparkR with RStudio
Configuring Hadoop run-time on Windows
Summary

What You Will Learn

  • Get solid theoretical understandings of ML algorithms
  • Configure Spark on cluster and cloud infrastructure to develop applications using Scala, Java, Python, and R
  • Scale up ML applications on large cluster or cloud infrastructures
  • Use Spark ML and MLlib to develop ML pipelines with recommendation system, classification, regression, clustering, sentiment analysis, and dimensionality reduction
  • Handle large texts for developing ML applications with strong focus on feature engineering
  • Use Spark Streaming to develop ML applications for real-time streaming
  • Tune ML models with cross-validation, hyperparameters tuning and train split
  • Enhance ML models to make them adaptable for new data in dynamic and incremental environments

Authors

Table of Contents

Chapter 1: Introduction to Data Analytics with Spark
Spark overview
New computing paradigm with Spark
Spark ecosystem
Spark machine learning libraries
Installing and getting started with Spark
Packaging your application with dependencies
Running a sample machine learning application
References
Summary
Chapter 2: Machine Learning Best Practices
What is machine learning?
Machine learning tasks
Practical machine learning problems
Most widely used machine learning problems
Large scale machine learning APIs in Spark
Practical machine learning best practices
Choosing the right algorithm for your application
Summary
Chapter 3: Understanding the Problem by Understanding the Data
Analyzing and preparing your data
Resilient Distributed Dataset basics
Dataset basics
Dataset from string and typed class
Spark and data scientists workflow
Deeper into Spark
Summary
Chapter 4: Extracting Knowledge through Feature Engineering
The state of the art of feature engineering
Best practices in feature engineering
Feature engineering with Spark
Advanced feature engineering
Summary
Chapter 5: Supervised and Unsupervised Learning by Examples
Machine learning classes
Supervised learning with Spark - an example
Unsupervised learning
Recommender system
Advanced learning and generalizations
Summary
Chapter 6: Building Scalable Machine Learning Pipelines
Spark machine learning pipeline APIs
Cancer-diagnosis pipeline with Spark
Cancer-prognosis pipeline with Spark
Market basket analysis with Spark Core
OCR pipeline with Spark
Topic modeling using Spark MLlib and ML
Credit risk analysis pipeline with Spark
Scaling the ML pipelines
Tips and performance considerations
Summary
Chapter 7: Tuning Machine Learning Models
Details about machine learning model tuning
Typical challenges in model tuning
Evaluating machine learning models
Validation and evaluation techniques
Parameter tuning for machine learning models
Hypothesis testing
Machine learning model selection
Summary
Chapter 8: Adapting Your Machine Learning Models
Adapting machine learning models
The generalization of ML models
Adapting through incremental algorithms
Adapting through reusing ML models
Machine learning in dynamic environments
Summary
Chapter 9: Advanced Machine Learning with Streaming and Graph Data
Developing real-time ML pipelines
Time series and social network analysis
Movie recommendation using Spark
Developing a real-time ML pipeline from streaming
ML pipeline on graph data and semi-supervised graph-based learning
Summary
Chapter 10: Configuring and Working with External Libraries
Third-party ML libraries with Spark
Using external libraries with Spark Core
Time series analysis using the Cloudera Spark-TS package
Configuring SparkR with RStudio
Configuring Hadoop run-time on Windows
Summary

Book Details

ISBN 139781785888748
Paperback476 pages
Read More

Read More Reviews

Recommended for You

Python Machine Learning Book Cover
Python Machine Learning
$ 35.99
$ 25.20
Practical Machine Learning Book Cover
Practical Machine Learning
$ 37.99
$ 26.60
Machine Learning with Spark Book Cover
Machine Learning with Spark
$ 29.99
$ 3.00
Spark Cookbook Book Cover
Spark Cookbook
$ 35.99
$ 25.20
Practical Data Science Cookbook Book Cover
Practical Data Science Cookbook
$ 29.99
$ 21.00
Scala for Machine Learning Book Cover
Scala for Machine Learning
$ 35.99
$ 25.20