Scala for Data Science

Leverage the power of Scala with different tools to build scalable, robust data science applications
Preview in Mapt

Scala for Data Science

Pascal Bugnion

1 customer reviews
Leverage the power of Scala with different tools to build scalable, robust data science applications
Mapt Subscription
FREE
$29.99/m after trial
eBook
$10.00
RRP $43.99
Save 77%
Print + eBook
$54.99
RRP $54.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$10.00
$54.99
$29.99 p/m after trial
RRP $43.99
RRP $54.99
Subscription
eBook
Print + eBook
Start 30 Day Trial

Frequently bought together


Scala for Data Science Book Cover
Scala for Data Science
$ 43.99
$ 10.00
Scala Design Patterns - Second Edition Book Cover
Scala Design Patterns - Second Edition
$ 35.99
$ 10.00
Buy 2 for $20.00
Save $59.98
Add to Cart

Book Details

ISBN 139781785281372
Paperback416 pages

Book Description

Scala is a multi-paradigm programming language (it supports both object-oriented and functional programming) and scripting language used to build applications for the JVM. Languages such as R, Python, Java, and so on are mostly used for data science. It is particularly good at analyzing large sets of data without any significant impact on performance and thus Scala is being adopted by many developers and data scientists. Data scientists might be aware that building applications that are truly scalable is hard. Scala, with its powerful functional libraries for interacting with databases and building scalable frameworks will give you the tools to construct robust data pipelines.

This book will introduce you to the libraries for ingesting, storing, manipulating, processing, and visualizing data in Scala.

Packed with real-world examples and interesting data sets, this book will teach you to ingest data from flat files and web APIs and store it in a SQL or NoSQL database. It will show you how to design scalable architectures to process and modelling your data, starting from simple concurrency constructs such as parallel collections and futures, through to actor systems and Apache Spark. As well as Scala’s emphasis on functional structures and immutability, you will learn how to use the right parallel construct for the job at hand, minimizing development time without compromising scalability. Finally, you will learn how to build beautiful interactive visualizations using web frameworks.

This book gives tutorials on some of the most common Scala libraries for data science, allowing you to quickly get up to speed with building data science and data engineering solutions.

Table of Contents

Chapter 1: Scala and Data Science
Data science
Programming in data science
Why Scala?
When not to use Scala
Summary
References
Chapter 2: Manipulating Data with Breeze
Code examples
Installing Breeze
Getting help on Breeze
Basic Breeze data types
An example – logistic regression
Towards re-usable code
Alternatives to Breeze
Summary
References
Chapter 3: Plotting with breeze-viz
Diving into Breeze
Customizing plots
Customizing the line type
More advanced scatter plots
Multi-plot example – scatterplot matrix plots
Managing without documentation
Breeze-viz reference
Data visualization beyond breeze-viz
Summary
Chapter 4: Parallel Collections and Futures
Parallel collections
Futures
Summary
References
Chapter 5: Scala and SQL through JDBC
Interacting with JDBC
First steps with JDBC
JDBC summary
Functional wrappers for JDBC
Safer JDBC connections with the loan pattern
Enriching JDBC statements with the "pimp my library" pattern
Wrapping result sets in a stream
Looser coupling with type classes
Creating a data access layer
Summary
References
Chapter 6: Slick – A Functional Interface for SQL
FEC data
Invokers
Operations on columns
Aggregations with "Group by"
Accessing database metadata
Slick versus JDBC
Summary
References
Chapter 7: Web APIs
A whirlwind tour of JSON
Querying web APIs
JSON in Scala – an exercise in pattern matching
Extraction using case classes
Concurrency and exception handling with futures
Authentication – adding HTTP headers
Summary
References
Chapter 8: Scala and MongoDB
MongoDB
Connecting to MongoDB with Casbah
Inserting documents
Extracting objects from the database
Complex queries
Casbah query DSL
Custom type serialization
Beyond Casbah
Summary
References
Chapter 9: Concurrency with Akka
GitHub follower graph
Actors as people
Hello world with Akka
Case classes as messages
Actor construction
Anatomy of an actor
Follower network crawler
Fetcher actors
Routing
Message passing between actors
Queue control and the pull pattern
Accessing the sender of a message
Stateful actors
Follower network crawler
Fault tolerance
Custom supervisor strategies
Life-cycle hooks
What we have not talked about
Summary
References
Chapter 10: Distributed Batch Processing with Spark
Installing Spark
Acquiring the example data
Resilient distributed datasets
Building and running standalone programs
Spam filtering
Lifting the hood
Data shuffling and partitions
Summary
Reference
Chapter 11: Spark SQL and DataFrames
DataFrames – a whirlwind introduction
Aggregation operations
Joining DataFrames together
Custom functions on DataFrames
DataFrame immutability and persistence
SQL statements on DataFrames
Complex data types – arrays, maps, and structs
Interacting with data sources
Standalone programs
Summary
References
Chapter 12: Distributed Machine Learning with MLlib
Introducing MLlib – Spam classification
Pipeline components
Evaluation
Regularization in logistic regression
Cross-validation and model selection
Beyond logistic regression
Summary
References
Chapter 13: Web APIs with Play
Client-server applications
Introduction to web frameworks
Model-View-Controller architecture
Single page applications
Building an application
The Play framework
Dynamic routing
Actions
Interacting with JSON
Querying external APIs and consuming JSON
Creating APIs with Play: a summary
Rest APIs: best practice
Summary
References
Chapter 14: Visualization with D3 and the Play Framework
GitHub user data
Do I need a backend?
JavaScript dependencies through web-jars
Towards a web application: HTML templates
Modular JavaScript through RequireJS
Bootstrapping the applications
Client-side program architecture
Drawing plots with NVD3
Summary
References

What You Will Learn

  • Transform and filter tabular data to extract features for machine learning
  • Implement your own algorithms or take advantage of MLLib’s extensive suite of models to build distributed machine learning pipelines
  • Read, transform, and write data to both SQL and NoSQL databases in a functional manner
  • Write robust routines to query web APIs
  • Read data from web APIs such as the GitHub or Twitter API
  • Use Scala to interact with MongoDB, which offers high performance and helps to store large data sets with uncertain query requirements
  • Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations
  • Deploy scalable parallel applications using Apache Spark, loading data from HDFS or Hive

Authors

Table of Contents

Chapter 1: Scala and Data Science
Data science
Programming in data science
Why Scala?
When not to use Scala
Summary
References
Chapter 2: Manipulating Data with Breeze
Code examples
Installing Breeze
Getting help on Breeze
Basic Breeze data types
An example – logistic regression
Towards re-usable code
Alternatives to Breeze
Summary
References
Chapter 3: Plotting with breeze-viz
Diving into Breeze
Customizing plots
Customizing the line type
More advanced scatter plots
Multi-plot example – scatterplot matrix plots
Managing without documentation
Breeze-viz reference
Data visualization beyond breeze-viz
Summary
Chapter 4: Parallel Collections and Futures
Parallel collections
Futures
Summary
References
Chapter 5: Scala and SQL through JDBC
Interacting with JDBC
First steps with JDBC
JDBC summary
Functional wrappers for JDBC
Safer JDBC connections with the loan pattern
Enriching JDBC statements with the "pimp my library" pattern
Wrapping result sets in a stream
Looser coupling with type classes
Creating a data access layer
Summary
References
Chapter 6: Slick – A Functional Interface for SQL
FEC data
Invokers
Operations on columns
Aggregations with "Group by"
Accessing database metadata
Slick versus JDBC
Summary
References
Chapter 7: Web APIs
A whirlwind tour of JSON
Querying web APIs
JSON in Scala – an exercise in pattern matching
Extraction using case classes
Concurrency and exception handling with futures
Authentication – adding HTTP headers
Summary
References
Chapter 8: Scala and MongoDB
MongoDB
Connecting to MongoDB with Casbah
Inserting documents
Extracting objects from the database
Complex queries
Casbah query DSL
Custom type serialization
Beyond Casbah
Summary
References
Chapter 9: Concurrency with Akka
GitHub follower graph
Actors as people
Hello world with Akka
Case classes as messages
Actor construction
Anatomy of an actor
Follower network crawler
Fetcher actors
Routing
Message passing between actors
Queue control and the pull pattern
Accessing the sender of a message
Stateful actors
Follower network crawler
Fault tolerance
Custom supervisor strategies
Life-cycle hooks
What we have not talked about
Summary
References
Chapter 10: Distributed Batch Processing with Spark
Installing Spark
Acquiring the example data
Resilient distributed datasets
Building and running standalone programs
Spam filtering
Lifting the hood
Data shuffling and partitions
Summary
Reference
Chapter 11: Spark SQL and DataFrames
DataFrames – a whirlwind introduction
Aggregation operations
Joining DataFrames together
Custom functions on DataFrames
DataFrame immutability and persistence
SQL statements on DataFrames
Complex data types – arrays, maps, and structs
Interacting with data sources
Standalone programs
Summary
References
Chapter 12: Distributed Machine Learning with MLlib
Introducing MLlib – Spam classification
Pipeline components
Evaluation
Regularization in logistic regression
Cross-validation and model selection
Beyond logistic regression
Summary
References
Chapter 13: Web APIs with Play
Client-server applications
Introduction to web frameworks
Model-View-Controller architecture
Single page applications
Building an application
The Play framework
Dynamic routing
Actions
Interacting with JSON
Querying external APIs and consuming JSON
Creating APIs with Play: a summary
Rest APIs: best practice
Summary
References
Chapter 14: Visualization with D3 and the Play Framework
GitHub user data
Do I need a backend?
JavaScript dependencies through web-jars
Towards a web application: HTML templates
Modular JavaScript through RequireJS
Bootstrapping the applications
Client-side program architecture
Drawing plots with NVD3
Summary
References

Book Details

ISBN 139781785281372
Paperback416 pages
Read More
From 1 reviews

Read More Reviews

Recommended for You

Building Applications with Scala Book Cover
Building Applications with Scala
$ 39.99
$ 10.00
Mastering Scala Machine Learning Book Cover
Mastering Scala Machine Learning
$ 39.99
$ 10.00
Scala Design Patterns Book Cover
Scala Design Patterns
$ 43.99
$ 10.00
Scala High Performance Programming Book Cover
Scala High Performance Programming
$ 35.99
$ 10.00
Scientific Computing with Scala Book Cover
Scientific Computing with Scala
$ 35.99
$ 10.00
Scala Test-Driven Development Book Cover
Scala Test-Driven Development
$ 31.99
$ 10.00