Mastering Scala Machine Learning

More Information
  • Sharpen your functional programming skills in Scala using REPL
  • Apply standard and advanced machine learning techniques using Scala
  • Get acquainted with Big Data technologies and grasp why we need a functional approach to Big Data
  • Discover new data structures, algorithms, approaches, and habits that will allow you to work effectively with large amounts of data
  • Understand the principles of supervised and unsupervised learning in machine learning
  • Work with unstructured data and serialize it using Kryo, Protobuf, Avro, and AvroParquet
  • Construct reliable and robust data pipelines and manage data in a data-driven enterprise
  • Implement scalable model monitoring and alerts with Scala

Since the advent of object-oriented programming, new technologies related to Big Data are constantly popping up on the market. One such technology is Scala, which is considered to be a successor to Java in the area of Big Data by many, like Java was to C/C++ in the area of distributed programing.

This book aims to take your knowledge to next level and help you impart that knowledge to build advanced applications such as social media mining, intelligent news portals, and more. After a quick refresher on functional programming concepts using REPL, you will see some practical examples of setting up the development environment and tinkering with data. We will then explore working with Spark and MLlib using k-means and decision trees.

Most of the data that we produce today is unstructured and raw, and you will learn to tackle this type of data with advanced topics such as regression, classification, integration, and working with graph algorithms. Finally, you will discover at how to use Scala to perform complex concept analysis, to monitor model performance, and to build a model repository. By the end of this book, you will have gained expertise in performing Scala machine learning and will be able to build complex machine learning projects using Scala.

  • This is a primer on functional-programming-style techniques to help you efficiently process and analyze all of your data
  • Get acquainted with the best and newest tools available such as Scala, Spark, Parquet and MLlib for machine learning
  • Learn the best practices to incorporate new Big Data machine learning in your data-driven enterprise to gain future scalability and maintainability
Page Count 310
Course Length 9 hours 18 minutes
ISBN 9781785880889
Date Of Publication 27 Jun 2016


Alex Kozlov

Alex Kozlov is a multidisciplinary big data scientist. He came to Silicon Valley in 1991, got his Ph.D. from Stanford University under the supervision of Prof. Daphne Koller and Prof. John Hennessy in 1998, and has been around a few computer and data management companies since. His latest stint was with Cloudera, the leader in Hadoop, where he was one of the early employees and ended up heading the solution architects group on the West Coast. Before that, he spent time with an online advertising company, Turn, Inc.; and before that, he had the privilege to work with HP Labs researchers at HP Inc., and on data mining software at SGI, Inc. Currently, Alexander is the chief solutions architect at an enterprise security startup, E8 Security, where he came to understand the intricacies of catching bad guys in the Internet universe.

On the non-professional side, Alexander lives in Sunnyvale, CA, together with his beautiful wife, Oxana, and other important family members, including three daughters, Lana, Nika, and Anna, and a cat and dog. His family also included a hamster and a fish at one point.

Alex is an active participant in Silicon Valley technology groups and meetups, and although he is not an official committer of any open source projects, he definitely contributed to many of them in the form of code or discussions. Alexander is an active coder and publishes his open source code at Other information can be looked up on his LinkedIn page at