Scala:Applied Machine Learning

Leverage the power of Scala and master the art of building, improving, and validating scalable machine learning and AI applications using Scala's most advanced and finest features.
Preview in Mapt
Code Files

Scala:Applied Machine Learning

Pascal Bugnion, Patrick R. Nicolas, Alex Kozlov

1 customer reviews
Leverage the power of Scala and master the art of building, improving, and validating scalable machine learning and AI applications using Scala's most advanced and finest features.
Mapt Subscription
FREE
$29.99/m after trial
eBook
$35.00
RRP $69.99
Save 49%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$35.00
$29.99 p/m after trial
RRP $69.99
Subscription
eBook
Start 14 Day Trial

Frequently bought together


Scala:Applied Machine Learning Book Cover
Scala:Applied Machine Learning
$ 69.99
$ 35.00
Scala: Guide for Data Science Professionals Book Cover
Scala: Guide for Data Science Professionals
$ 71.99
$ 36.00
Buy 2 for $35.00
Save $106.98
Add to Cart

Book Details

ISBN 139781787126640
Paperback1265 pages

Book Description

This Learning Path aims to put the entire world of machine learning with Scala in front of you.

Scala for Data Science, the first module in this course, is a tutorial guide that provides tutorials on some of the most common Scala libraries for data science, allowing you to quickly get up to speed building data science and data engineering solutions.

The second course, Scala for Machine Learning guides you through the process of building AI applications with diagrams, formal mathematical notation, source code snippets, and useful tips. A review of the Akka framework and Apache Spark clusters concludes the tutorial.

The next module, Mastering Scala Machine Learning, is the final step in this course. It will take your knowledge to next level and help you use the knowledge to build advanced applications such as social media mining, intelligent news portals, and more. After a quick refresher on functional programming concepts using REPL, you will see some practical examples of setting up the development environment and tinkering with data. We will then explore working with Spark and MLlib using k-means and decision trees.

By the end of this course, you will be a master at Scala machine learning and have enough expertise to be able to build complex machine learning projects using Scala.

This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:

  • Scala for Data Science, Pascal Bugnion
  • Scala for Machine Learning, Patrick Nicolas
  • Mastering Scala Machine Learning, Alex Kozlov

Table of Contents

Chapter 1: Scala and Data Science
Data science
Programming in data science
Why Scala?
When not to use Scala
Summary
References
Chapter 2: Manipulating Data with Breeze
Code examples
Installing Breeze
Getting help on Breeze
Basic Breeze data types
An example – logistic regression
Towards re-usable code
Alternatives to Breeze
Summary
References
Chapter 3: Plotting with breeze-viz
Diving into Breeze
Customizing plots
Customizing the line type
More advanced scatter plots
Multi-plot example – scatterplot matrix plots
Managing without documentation
Breeze-viz reference
Data visualization beyond breeze-viz
Summary
Chapter 4: Parallel Collections and Futures
Parallel collections
Futures
Summary
References
Chapter 5: Scala and SQL through JDBC
Interacting with JDBC
First steps with JDBC
JDBC summary
Functional wrappers for JDBC
Safer JDBC connections with the loan pattern
Enriching JDBC statements with the "pimp my library" pattern
Wrapping result sets in a stream
Looser coupling with type classes
Creating a data access layer
Summary
References
Chapter 6: Slick – A Functional Interface for SQL
FEC data
Invokers
Operations on columns
Aggregations with "Group by"
Accessing database metadata
Slick versus JDBC
Summary
References
Chapter 7: Web APIs
A whirlwind tour of JSON
Querying web APIs
JSON in Scala – an exercise in pattern matching
Extraction using case classes
Concurrency and exception handling with futures
Authentication – adding HTTP headers
Summary
References
Chapter 8: Scala and MongoDB
MongoDB
Connecting to MongoDB with Casbah
Inserting documents
Extracting objects from the database
Complex queries
Casbah query DSL
Custom type serialization
Beyond Casbah
Summary
References
Chapter 9: Concurrency with Akka
GitHub follower graph
Actors as people
Hello world with Akka
Case classes as messages
Actor construction
Anatomy of an actor
Follower network crawler
Fetcher actors
Routing
Message passing between actors
Queue control and the pull pattern
Accessing the sender of a message
Stateful actors
Follower network crawler
Fault tolerance
Custom supervisor strategies
Life-cycle hooks
What we have not talked about
Summary
References
Chapter 10: Distributed Batch Processing with Spark
Installing Spark
Acquiring the example data
Resilient distributed datasets
Building and running standalone programs
Spam filtering
Lifting the hood
Data shuffling and partitions
Summary
Reference
Chapter 11: Spark SQL and DataFrames
DataFrames – a whirlwind introduction
Aggregation operations
Joining DataFrames together
Custom functions on DataFrames
DataFrame immutability and persistence
SQL statements on DataFrames
Complex data types – arrays, maps, and structs
Interacting with data sources
Standalone programs
Summary
References
Chapter 12: Distributed Machine Learning with MLlib
Introducing MLlib – Spam classification
Pipeline components
Evaluation
Regularization in logistic regression
Cross-validation and model selection
Beyond logistic regression
Summary
References
Chapter 13: Web APIs with Play
Client-server applications
Introduction to web frameworks
Model-View-Controller architecture
Single page applications
Building an application
The Play framework
Dynamic routing
Actions
Interacting with JSON
Querying external APIs and consuming JSON
Creating APIs with Play: a summary
Rest APIs: best practice
Summary
References
Chapter 14: Visualization with D3 and the Play Framework
GitHub user data
Do I need a backend?
JavaScript dependencies through web-jars
Towards a web application: HTML templates
Modular JavaScript through RequireJS
Bootstrapping the applications
Client-side program architecture
Drawing plots with NVD3
Summary
References
Chapter 15: Getting Started
Mathematical notation for the curious
Why machine learning?
Why Scala?
Model categorization
Taxonomy of machine learning algorithms
Don't reinvent the wheel!
Tools and frameworks
Source code
Let's kick the tires
Summary
Chapter 16: Hello World!
Modeling
Defining a methodology
Monadic data transformation
A workflow computational model
Profiling data
Assessing a model
Summary
Chapter 17: Data Preprocessing
Time series in Scala
Moving averages
Fourier analysis
The discrete Kalman filter
Alternative preprocessing techniques
Summary
Chapter 18: Unsupervised Learning
Clustering
Dimension reduction
Performance considerations
Summary
Chapter 19: Naïve Bayes Classifiers
Probabilistic graphical models
Naïve Bayes classifiers
The Multivariate Bernoulli classification
Naïve Bayes and text mining
Pros and cons
Summary
Chapter 20: Regression and Regularization
Linear regression
Regularization
Numerical optimization
Logistic regression
Summary
Chapter 21: Sequential Data Models
Markov decision processes
The hidden Markov model
Conditional random fields
Regularized CRFs and text analytics
Comparing CRF and HMM
Performance consideration
Summary
Chapter 22: Kernel Models and Support Vector Machines
Kernel functions
Support vector machines
Support vector classifiers – SVC
Anomaly detection with one-class SVC
Support vector regression
Performance considerations
Summary
Chapter 23: Artificial Neural Networks
Feed-forward neural networks
The multilayer perceptron
Evaluation
Convolution neural networks
Benefits and limitations
Summary
Chapter 24: Genetic Algorithms
Evolution
Genetic algorithms and machine learning
Genetic algorithm components
Implementation
GA for trading strategies
Advantages and risks of genetic algorithms
Summary
Chapter 25: Reinforcement Learning
Reinforcement learning
Learning classifier systems
Summary
Chapter 26: Scalable Frameworks
An overview
Scala
Scalability with Actors
Akka
Apache Spark
Summary
Chapter 27: Exploratory Data Analysis
Getting started with Scala
Distinct values of a categorical field
Summarization of a numeric field
Basic, stratified, and consistent sampling
Working with Scala and Spark Notebooks
Basic correlations
Summary
Chapter 28: Data Pipelines and Modeling
Influence diagrams
Sequential trials and dealing with risk
Exploration and exploitation
Unknown unknowns
Basic components of a data-driven system
Optimization and interactivity
Summary
Chapter 29: Working with Spark and MLlib
Setting up Spark
Understanding Spark architecture
Applications
ML libraries
Spark performance tuning
Running Hadoop HDFS
Summary
Chapter 30: Supervised and Unsupervised Learning
Records and supervised learning
Unsupervised learning
Problem dimensionality
Summary
Chapter 31: Regression and Classification
What regression stands for?
Continuous space and metrics
Linear regression
Logistic regression
Regularization
Multivariate regression
Heteroscedasticity
Regression trees
Classification metrics
Multiclass problems
Perceptron
Generalization error and overfitting
Summary
Chapter 32: Working with Unstructured Data
Nested data
Other serialization formats
Hive and Impala
Sessionization
Working with traits
Working with pattern matching
Other uses of unstructured data
Probabilistic structures
Projections
Summary
Chapter 33: Working with Graph Algorithms
A quick introduction to graphs
SBT
Graph for Scala
GraphX
Summary
Chapter 34: Integrating Scala with R and Python
Integrating with R
Integrating with Python
Summary
Chapter 35: NLP in Scala
Text analysis pipeline
MLlib algorithms in Spark
Segmentation, annotation, and chunking
POS tagging
Using word2vec to find word relationships
Summary
Chapter 36: Advanced Model Monitoring
System monitoring
Process monitoring
Model monitoring
Summary

What You Will Learn

  • Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations
  • Deploy scalable parallel applications using Apache Spark, loading data from HDFS or Hive
  • Solve big data problems with Scala parallel collections, Akka actors, and Apache Spark clusters
  • Apply key learning strategies to perform technical analysis of financial markets
  • Understand the principles of supervised and unsupervised learning in machine learning
  • Work with unstructured data and serialize it using Kryo, Protobuf, Avro, and AvroParquet
  • Construct reliable and robust data pipelines and manage data in a data-driven enterprise
  • Implement scalable model monitoring and alerts with Scala

Authors

Table of Contents

Chapter 1: Scala and Data Science
Data science
Programming in data science
Why Scala?
When not to use Scala
Summary
References
Chapter 2: Manipulating Data with Breeze
Code examples
Installing Breeze
Getting help on Breeze
Basic Breeze data types
An example – logistic regression
Towards re-usable code
Alternatives to Breeze
Summary
References
Chapter 3: Plotting with breeze-viz
Diving into Breeze
Customizing plots
Customizing the line type
More advanced scatter plots
Multi-plot example – scatterplot matrix plots
Managing without documentation
Breeze-viz reference
Data visualization beyond breeze-viz
Summary
Chapter 4: Parallel Collections and Futures
Parallel collections
Futures
Summary
References
Chapter 5: Scala and SQL through JDBC
Interacting with JDBC
First steps with JDBC
JDBC summary
Functional wrappers for JDBC
Safer JDBC connections with the loan pattern
Enriching JDBC statements with the "pimp my library" pattern
Wrapping result sets in a stream
Looser coupling with type classes
Creating a data access layer
Summary
References
Chapter 6: Slick – A Functional Interface for SQL
FEC data
Invokers
Operations on columns
Aggregations with "Group by"
Accessing database metadata
Slick versus JDBC
Summary
References
Chapter 7: Web APIs
A whirlwind tour of JSON
Querying web APIs
JSON in Scala – an exercise in pattern matching
Extraction using case classes
Concurrency and exception handling with futures
Authentication – adding HTTP headers
Summary
References
Chapter 8: Scala and MongoDB
MongoDB
Connecting to MongoDB with Casbah
Inserting documents
Extracting objects from the database
Complex queries
Casbah query DSL
Custom type serialization
Beyond Casbah
Summary
References
Chapter 9: Concurrency with Akka
GitHub follower graph
Actors as people
Hello world with Akka
Case classes as messages
Actor construction
Anatomy of an actor
Follower network crawler
Fetcher actors
Routing
Message passing between actors
Queue control and the pull pattern
Accessing the sender of a message
Stateful actors
Follower network crawler
Fault tolerance
Custom supervisor strategies
Life-cycle hooks
What we have not talked about
Summary
References
Chapter 10: Distributed Batch Processing with Spark
Installing Spark
Acquiring the example data
Resilient distributed datasets
Building and running standalone programs
Spam filtering
Lifting the hood
Data shuffling and partitions
Summary
Reference
Chapter 11: Spark SQL and DataFrames
DataFrames – a whirlwind introduction
Aggregation operations
Joining DataFrames together
Custom functions on DataFrames
DataFrame immutability and persistence
SQL statements on DataFrames
Complex data types – arrays, maps, and structs
Interacting with data sources
Standalone programs
Summary
References
Chapter 12: Distributed Machine Learning with MLlib
Introducing MLlib – Spam classification
Pipeline components
Evaluation
Regularization in logistic regression
Cross-validation and model selection
Beyond logistic regression
Summary
References
Chapter 13: Web APIs with Play
Client-server applications
Introduction to web frameworks
Model-View-Controller architecture
Single page applications
Building an application
The Play framework
Dynamic routing
Actions
Interacting with JSON
Querying external APIs and consuming JSON
Creating APIs with Play: a summary
Rest APIs: best practice
Summary
References
Chapter 14: Visualization with D3 and the Play Framework
GitHub user data
Do I need a backend?
JavaScript dependencies through web-jars
Towards a web application: HTML templates
Modular JavaScript through RequireJS
Bootstrapping the applications
Client-side program architecture
Drawing plots with NVD3
Summary
References
Chapter 15: Getting Started
Mathematical notation for the curious
Why machine learning?
Why Scala?
Model categorization
Taxonomy of machine learning algorithms
Don't reinvent the wheel!
Tools and frameworks
Source code
Let's kick the tires
Summary
Chapter 16: Hello World!
Modeling
Defining a methodology
Monadic data transformation
A workflow computational model
Profiling data
Assessing a model
Summary
Chapter 17: Data Preprocessing
Time series in Scala
Moving averages
Fourier analysis
The discrete Kalman filter
Alternative preprocessing techniques
Summary
Chapter 18: Unsupervised Learning
Clustering
Dimension reduction
Performance considerations
Summary
Chapter 19: Naïve Bayes Classifiers
Probabilistic graphical models
Naïve Bayes classifiers
The Multivariate Bernoulli classification
Naïve Bayes and text mining
Pros and cons
Summary
Chapter 20: Regression and Regularization
Linear regression
Regularization
Numerical optimization
Logistic regression
Summary
Chapter 21: Sequential Data Models
Markov decision processes
The hidden Markov model
Conditional random fields
Regularized CRFs and text analytics
Comparing CRF and HMM
Performance consideration
Summary
Chapter 22: Kernel Models and Support Vector Machines
Kernel functions
Support vector machines
Support vector classifiers – SVC
Anomaly detection with one-class SVC
Support vector regression
Performance considerations
Summary
Chapter 23: Artificial Neural Networks
Feed-forward neural networks
The multilayer perceptron
Evaluation
Convolution neural networks
Benefits and limitations
Summary
Chapter 24: Genetic Algorithms
Evolution
Genetic algorithms and machine learning
Genetic algorithm components
Implementation
GA for trading strategies
Advantages and risks of genetic algorithms
Summary
Chapter 25: Reinforcement Learning
Reinforcement learning
Learning classifier systems
Summary
Chapter 26: Scalable Frameworks
An overview
Scala
Scalability with Actors
Akka
Apache Spark
Summary
Chapter 27: Exploratory Data Analysis
Getting started with Scala
Distinct values of a categorical field
Summarization of a numeric field
Basic, stratified, and consistent sampling
Working with Scala and Spark Notebooks
Basic correlations
Summary
Chapter 28: Data Pipelines and Modeling
Influence diagrams
Sequential trials and dealing with risk
Exploration and exploitation
Unknown unknowns
Basic components of a data-driven system
Optimization and interactivity
Summary
Chapter 29: Working with Spark and MLlib
Setting up Spark
Understanding Spark architecture
Applications
ML libraries
Spark performance tuning
Running Hadoop HDFS
Summary
Chapter 30: Supervised and Unsupervised Learning
Records and supervised learning
Unsupervised learning
Problem dimensionality
Summary
Chapter 31: Regression and Classification
What regression stands for?
Continuous space and metrics
Linear regression
Logistic regression
Regularization
Multivariate regression
Heteroscedasticity
Regression trees
Classification metrics
Multiclass problems
Perceptron
Generalization error and overfitting
Summary
Chapter 32: Working with Unstructured Data
Nested data
Other serialization formats
Hive and Impala
Sessionization
Working with traits
Working with pattern matching
Other uses of unstructured data
Probabilistic structures
Projections
Summary
Chapter 33: Working with Graph Algorithms
A quick introduction to graphs
SBT
Graph for Scala
GraphX
Summary
Chapter 34: Integrating Scala with R and Python
Integrating with R
Integrating with Python
Summary
Chapter 35: NLP in Scala
Text analysis pipeline
MLlib algorithms in Spark
Segmentation, annotation, and chunking
POS tagging
Using word2vec to find word relationships
Summary
Chapter 36: Advanced Model Monitoring
System monitoring
Process monitoring
Model monitoring
Summary

Book Details

ISBN 139781787126640
Paperback1265 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Scala: Guide for Data Science Professionals Book Cover
Scala: Guide for Data Science Professionals
$ 71.99
$ 36.00
Machine Learning: End-to-End guide for Java developers Book Cover
Machine Learning: End-to-End guide for Java developers
$ 75.99
$ 38.00
Statistics for Machine Learning Book Cover
Statistics for Machine Learning
$ 39.99
$ 20.00
Deep Learning: Practical Neural Networks with Java Book Cover
Deep Learning: Practical Neural Networks with Java
$ 67.99
$ 34.00
Deep Learning with Keras Book Cover
Deep Learning with Keras
$ 39.99
$ 20.00
Understanding Software Book Cover
Understanding Software
$ 23.99
$ 12.00