Chapter 2. The Model - Scala and Akka
This chapter is divided into two parts: Scala (the language) and Akka (the actor model implementation for the JVM).
As this book is about architecture, and Spark is built in Scala following an actor's model, in this book we decided to show examples only using Scala as the language. In this way, we have made room for architectural issues preventing lead content.
In the Apache Spark world, there are four spoken languages: Java, Scala, Python, and R. To continue with our training, we need to know one of these four languages. Most books expose all examples in each of these languages.
If you are reading this section and do not know Scala, welcome to the introduction course for data manipulation. This chapter is a dojo where you will learn some Scala tricks to manipulate data (because it is not a book about Scala, some powerful topics were not mentioned, such as null-less containers, for example, option, either, try, pattern matching, and case classes). It...
The objective of this section is to think in a functional programming way.
As good data architects, here we will understand collections. We will not cover other issues of the language other than collection management.
We need to be clear regarding the following two statements:
- Scala collections are different from Java collections
- Scala collections are different from Spark collections
So, a list in Java is different from a list in Scala. Lists are a fundamental part of functional languages. The first functional programming language, LISP, is an acronym for List Processing.
We have to master three key concepts of functional programming to understand Scala collections:
- Predicates
- Literal functions (anonymous functions)
- Implicit loops
A predicate is just a function that receives several parameters and returns a Boolean value.
For example:
def isOdd (i: Int) = if (i % 2 != 0) true else false
A literal function is an alternate syntax for defining a function. It's useful when we want...
The objective of this section is to think about our systems in the Actor Model.
The Actor Model is a mathematical model. As Obi Wan would say, it's "An elegant weapon for a more civilized age." The Actor Model was developed by Carl Hewitt, Peter Bishop, and Richard Steiger in 1973 at the Massachusetts Institute of Technology, in a paper entitled, A Universal Modular Actor Formalism for Artificial Intelligence.
It was a more civilized age, because computer science was developed by mathematicians and all the programming was made with their bare hands. Well, if the Actor Model has been around for more than 40 years, at what point did we turn to the dark side? The answer is neither short nor simple to explain.
The quick and dirty answer is: because they developed a very advanced model for the technology of those days. The problem is that we had to develop a lot of technology in software and hardware to reap benefits from the Actor Model. Modern compilers, modern processors, and...
This chapter was a Scala-Akka dojo where you learnt through several Katas. In the first part we explored the fundamental parts of Scala; in the second part we focused on the Akka actor model.
It is true, there were many important topics not covered in this chapter, such as futures, promises, and parallel collections. But we tried to provide a reference to them, although not an exhaustive guide.
So, as all the book examples are in Scala, we need to master fundamental techniques before delve into the SMACK stack.
The Actor Model is important to understand the architecture and operation of Spark.
In the following chapter, we will explore Spark design and provide some examples using Scala.