Scala Essentials for Data Engineers
Welcome to the world of data engineering with Scala. But why Scala? The following are some of the reasons for learning Scala:
- Scala provides type safety
- Big corporations such as Netflix and Airbnb have a lot of data pipelines written in Scala
- Scala is native to Spark
- Scala allows data engineers to adopt a software engineering mindset
Scala is a high-level general-purpose programming language that runs on a standard Java platform. It was created by Martin Odersky in 2001. The name Scala stands for scalable language, and it provides excellent support for both object-oriented and functional programming styles.
This chapter is meant as a quick introduction to concepts that the subsequent chapters build upon. Specifically, this chapter covers the following topics:
- Understanding functional programming
- Understanding objects, classes, and traits
- Higher-order functions (HOFs)
- Examples of HOFs from the Scala collection library
- Understanding polymorphic functions
- Variance
- Option types
- Collections
- Pattern matching
- Implicits in Scala
Technical requirements
This chapter is long and contains lots of examples to explain the concepts that are introduced. All of the examples are self-contained, and we encourage you to try them yourself as you move through the chapter. You will need a working Scala environment to run these examples.
You can choose to configure it by following the steps outlined in Chapter 2 or use an online Scala playground such as Scastie (https://scastie.scala-lang.org/). We will use Scala 2.12 as the language version.
Understanding functional programming
Functional programming is based on the principle that programs are constructed using only pure functions. A pure function does not have any side effects and only returns a result. Some examples of side effects are modifying a variable, modifying a data structure in place, and performing I/O. We can think of a pure function as just like a regular algebraic function.
An example of a pure function is the length function on a string object. It only returns the length of the string and does nothing else, such as mutating a variable. Similarly, an integer addition function that takes two integers and returns an integer is a pure function.
Two important aspects of functional programming are referential transparency (RT) and the substitution model. An expression is referentially transparent if all of its occurrences can be substituted by the result of the expression without altering the meaning of the program.
In the following example, Example 1.1, we set x
and then use it to set r1
and r2
, both of which have the same value:
scala> val x: String = "hello" x: String = hello scala> val r1 = x + " world!" r1: String = hello world! scala> val r2 = x + " world!" r2: String = hello world!
Example 1.1
Now, if we replace x
with the expression referenced by x
, r1
and r2
will be the same. In other words, the expression hello
is referentially transparent.
Example 1.2 shows the output from a Scala interpreter:
scala> val r1 = "hello" + " world!" r1: String = hello world! scala> val r2 = "hello" + " world!" r2: String = hello world!
Example 1.2
Let’s now look at the following example, Example 1.3, where x
is an instance of StringBuilder
instead of String
:
scala> val x = new StringBuilder("who") x: StringBuilder = who scala> val y = x.append(" am i?") y: StringBuilder = who am i? scala> val r1 = y.toString r1: String = who am i? scala> val r2 = y.toString r2: String = who am i?
Example 1.3
If we substitute y
with the expression it refers to (val y = x.append(" am i?")
), r1
and r2
will no longer be equal:
scala> val x = new StringBuilder("who") x: StringBuilder = who scala> val r1 = x.append(" am i?").toString r1: String = who am i? scala> val r2 = x.append(" am i?").toString r2: String = who am i? am i?
Example 1.4
So, the expression x.append(" am i?")
is not referentially transparent.
One of the advantages of the functional programming style is it allows you to apply local reasoning without having to worry about whether it updates any globally accessible mutable state. Also, since no variable in the global scope is updated, it considerably simplifies building a multi-threaded application.
Another advantage is pure functions are also easier to test as they do not depend on any state apart from the inputs supplied, and they generate the same output for the same input values.
We won’t delve deep into functional programming as it is outside of the scope of this book. Please refer to the Further reading section for additional material on functional programming. In the rest of this chapter, we will provide a high-level tour of some of the important language features that the subsequent chapters build upon.
In this section, we looked at a very high-level introduction to functional programming. Starting with the next section, we will look at Scala language features that enable both functional and object-oriented programming styles.
Understanding objects, classes, and traits
In this section, we are going to look at classes, traits, and objects. If you have used Java before, then some of the topics covered in this section will look familiar. However, there are several differences too. For example, Scala provides singleton objects, which automatically create a class and a single instance of that class in one go. Another example is Scala has case classes, which provide great support for pattern matching, allow you to create instances without the new
keyword, and provide a default toString
implementation that is quite handy when printing to the console.
We will first look at classes, followed by objects, and then wrap this section up with a quick tour of traits.
Classes
A class is a blueprint for objects, which are instances of that class. For example, we can create a Point
class using the following code:
class Point(val x: Int, val y: Int) { def add(that: Point): Point = new Point(x + that.x, y + that.y) override def toString: String = s"($x, $y)" }
Example 1.5
The Point
class has four members—two immutable variables, x
and y
, as well as two methods, add
and toString
. We can create instances of the Point
class as follows:
scala> val p1 = new Point(1,1) p1: Point = (1, 1) scala> val p2 = new Point(2,3) p2: Point = (2, 3)
Example 1.6
We can then create a new instance, p3
, by adding p1
and p2
, as follows:
scala> val p3 = p1 add p2 p3: Point = (3, 4)
Example 1.7
Scala supports the infix notation, characterized by the placement of operators between operands, and automatically converts p1 add p2
to p1.add(p2)
. Another way to define the Point
class is using a case
class, as shown here:
case class Point(x: Int, y: Int) { def add(that: Point): Point = new Point(x + that.x, y + that.y) }
Example 1.8
A case
class automatically adds a factory method with the name of the class, which enables us to leave out the new
keyword when creating an instance. A factory method is used to create instances of a class without requiring us to explicitly call the constructor method. Refer to the following example:
scala> val p1 = Point(1,1) p1: Point = Point(1,1) scala> val p2 = Point(2,3) p2: Point = Point(2,3)
Example 1.9
The compiler also adds default implementations of various methods such as toString
and hashCode
, which the regular class definition lacks. So, we did not have to override the toString
method, as was done earlier, and yet both p1
and p2
were printed neatly on the console (Example 1.9).
All arguments in the parameter list of a case class automatically get a val
prefix, which makes them parametric fields. A parametric field is a shorthand that defines a parameter and a field with the same name.
To better understand the difference, let’s look at the following example:
scala> case class Point1(x: Int, y: Int) //x and y are parametric fields defined class Point1 scala> class Point2(x: Int, y: Int) //x and y are regular parameters defined class Point2 scala> val p1 = Point1(1, 2) p1: Point1 = Point1(1,2) scala> val p2 = new Point2(3, 4) p2: Point2 = Point2@203ced18
Example 1.10
If we now try to access p1.x
, it will work because x
is a parametric field, whereas trying to access p2.x
will result in an error. Example 1.11 illustrates this:
scala> println(p1.x) 1 scala> println(p2.x) <console>:13: error: value x is not a member of Point2 println(p2.x) ^
Example 1.11
Trying to access p2.x
will result in a compile error, value x is not a member of Point2
. Case classes also have excellent support for pattern matching, as we will see in the Understanding pattern matching section.
Scala also provides an abstract
class, which, unlike a regular class, can contain abstract methods. For example, we can define the following hierarchy:
abstract class Animal abstract class Pet extends Animal { def name: String } class Dog(val name: String) extends Pet { override def toString = s"Dog($name)" } scala> val pluto = new Dog("Pluto") pluto: Dog = Dog(Pluto)
Animal
is the base class. Pet
extends Animal
and declares an abstract method, name
. Dog
extends Pet
and uses a parametric field, name
(it is both a parameter as well as a field). Because Scala uses the same namespace for fields and methods, this allows the field name in the Dog
class to provide a concrete implementation of the abstract method name in Pet
.
Object
Unlike Java, Scala does not support static members in classes; instead, it has singleton objects. A singleton object is defined using the object
keyword, as shown here:
class Point(val x: Int, val y: Int) { // new keyword is not required to create a Point object // apply method from companion object is invoked def add(that: Point): Point = Point(x + that.x, y + that.y) override def toString: String = s"($x, $y)" } object Point { def apply(x: Int, y: Int) = new Point(x, y) }
Example 1.13
In this example, the Point
singleton object shares the same name with the class and is called that class’s companion object. The class is called the companion class of the singleton object. For an object to qualify as a companion object of a given class, it needs to be in the same source file as the class itself.
Please note that the add
method does not use the new
keyword on the right-hand side. Point(x1, y1)
is de-sugared into Point.apply(x1, y1)
, which returns a Point
instance.
Singleton objects are also used to write an entrypoint for Scala applications. One option is to provide an explicit main
method within the singleton object, as shown here:
object SampleScalaApplication { def main(args: Array[String]): Unit = { println(s"This is a sample Scala application") } }
Example 1.14
The other option is to extend the App
trait, which provides a main method implementation. We will cover traits in the next section. You can also refer to the Further reading section (the third point) for more information:
object SampleScalaApplication extends App { println(s"This is a sample Scala application") }
Example 1.15
Trait
Scala also has traits, which are used to define rich interfaces as well as stackable modifications. You can read more stackable modifications in the Further reading section (the fourth point) Unlike class inheritance, where each class inherits from just one super class, a class can mix in any number of traits. A trait can have abstract as well as concrete members. Here is a simplified example of the Ordered
trait from the Scala standard library:
trait Ordered[T] { // compares receiver (this) with argument of the same type def compare(that: T): Int def <(that: T): Boolean = (this compare that) < 0 def >(that: T): Boolean = (this compare that) > 0 def <=(that: T): Boolean = (this compare that) <= 0 def >=(that: T): Boolean = (this compare that) >= 0 }
Example 1.16
The Ordered
trait takes a type parameter, T
, and has an abstract method, compare
. All of the other methods are defined in terms of that method. A class can add the functionalities defined by <
, >
, and so on, just by defining the compare
method. The compare
method should return a negative integer if the receiver is less than the argument, positive if the receiver is greater than the argument, and 0
if both objects are the same.
Going back to our Point
example, we can define a rule to say that a point, p1
, is greater than p2
if the distance of p1
from the origin is greater than that of p2
:
case class Point(x: Int, y: Int) extends Ordered[Point] { def add(that: Point): Point = new Point(x + that.x, y + that.y) def compare(that: Point) = (x ^ 2 + y ^ 2) ^ 1 / 2 - (that.x ^ 2 + that.y ^ 2) ^ 1 / 2 }
Example 1.17
With the definition of compare
now in place, we can perform a comparison between two arbitrary points, as follows:
scala> val p1 = Point(1,1) p1: Point = Point(1,1) scala> val p2 = Point(2,2) p2: Point = Point(2,2) scala> println(s"p1 is greater than p2: ${p1 > p2}") p1 is greater than p2: false example 1.18
In this section, we looked at objects, classes, and traits. In the next section, we are going to look at HOFs.
Working with higher-order functions (HOFs)
In Scala, functions are first-class citizens, which means function values can be assigned to variables, passed to functions as arguments, or returned by a function as a value. HOFs take one or more functions as arguments or return a function as a value.
A method can also be passed as an argument to an HOF because the Scala compiler will coerce a method into a function of the required type. For example, let’s define a function literal and a method, both of which take a pair of integers, perform an operation, and then return an integer:
//function literal val add: (Int, Int) => Int = (x, y) => x + y //a method def multiply(x: Int, y: Int): Int = x * y
Example 1.19
Let’s now define a method that takes two integer arguments and performs an operation, op
, on them:
def op(x: Int, y: Int) (f: (Int, Int) => Int): Int = f(x,y)
Example 1.20
We can pass any function (or method) of type (Int, Int) => Int
to op
, as the following example illustrates:
scala> op(1,2)(add) res15: Int = 3 scala> op(2,3)(multiply) res16: Int = 6
Example 1.21
This ability to pass functions as parameters is extremely powerful as it allows us to write generic code that can execute arbitrary user-supplied functions. In fact, many of the methods defined in the Scala collection library require functions as arguments, as we will see in the next section.
Examples of HOFs from the Scala collection library
Scala collections provide transformers that take a base collection, run some transformations over each of the collection’s elements, and return a new collection. For example, we can transform a list of integers by doubling each of its elements using the map
method, which we will cover in a bit:
scala> List(1,2,3,4).map(_ * 2) res17: List[Int] = List(2, 4, 6, 8)
Example 1.22
A traversable trait, which is a base trait for all kinds of Scala collections, implements behaviors common to all collections, in terms of a foreach
method, with the following signature:
def foreach[U](f: A => U): Unit
Example 1.23
The argument f
is a function of type A => U
, which is shorthand for Function1[A,U]
, and thus foreach
is an HOF. This is an abstract method that needs to be implemented by all classes that mix in Traversable
. The return type is Unit
, which means this method does not return any meaningful value and is primarily used for side effects.
Here is an example that prints the elements of a List
:
scala> /** let's start with a foreach call that prints the numbers in a list | * List(1,2,3,4).foreach((i: Int) => println(i)) | * we can skip the type argument and let Scala infer it | * List(1,2,3,4).foreach( i => println(i)) | * Scala provides a shorthand to replace arguments using _ | * if the arguments are used only once on the right side | * List(1,2,3,4).foreach(println(_)) | * finally Scala allows to leave the argument altogether | * if there is only one argument used on the right side | */ | List(1,2,3,4).foreach(println) 1 2 3 4
Example 1.24
For the rest of the examples, we will continue to use the List
collection type, but they are available for other types of collections, such as Array
, Map
, and Set
.
map
is similar to foreach
, but instead of returning a unit, it returns a collection by applying the function f
to each element of the base collection. Here is the signature for List[A]
:
final def map[B](f: (A) ⇒ B): List[B]
Example 1.25
Using the list from the previous example, if we want to double each of the elements in the list, but return a list of Doubles
instead of Ints
, it can be achieved by using the following:
scala> List(1,2,3,4).map(_ * 2.0) res22: List[Double] = List(2.0, 4.0, 6.0, 8.0)
Example 1.26
The preceding expression returns a list of Double
and can be chained with foreach
to print the values contained in the list:
scala> List(1,2,3,4).map(_ * 2.0).foreach(println) 2.0 4.0 6.0 8.0
Example 1.27
A close cousin of map
is flatMap
, which comprises of two parts—map
and flatten
. Before looking into flatMap
, let’s look at flatten
:
//converts a list of traversable collections into a list //formed by the elements of the traversable collections def flatten[B]: List[B]
Example 1.28
As the name suggests, it flattens the inner collections:
scala> List(Set(1,2,3), Set(4,5,6)).flatten res24: List[Int] = List(1, 2, 3, 4, 5, 6)
Example 1.29
Now that we have seen what flatten
does, let’s go back to flatMap
.
Let’s say that for each element of List(1,2,3,4)
, we want to create List
of elements from 0
to that number (both inclusive) and then combine all of those individual lists into a single list. Our first pass at it would look like the following:
scala> List(1,2,3,4).map(0 to _).flatten res25: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4)
Example 1.30
With flatMap
, we can achieve the same result in one step:
scala> List(1,2,3,4).flatMap(0 to _) res26: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4)
Example 1.31
Scala collections also provide filter
, which accepts a function that returns a Boolean as an argument, which is then used to filter elements of a given collection:
def filter(p: (A) ⇒ Boolean): List[A]
Example 1.32
For example, to filter all of the even integers from List
of numbers from 1 to 100, try the following:
scala> List.tabulate(100)(_ + 1).filter(_ % 2 == 0) res27: List[Int] = List(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100)
Example 1.33
There is also withFilter
, which provides performance benefits over filter
through the lazy evaluation of intermediate collections. It is part of the TraversableLike
trait, with the FilterMonadic
trait providing the abstract definition:
trait FilterMonadic[+A, +Repr] extends Any { //includes map, flatMap and foreach but are skipped here def withFilter(p: A => Boolean): FilterMonadic[A, Repr] }
Example 1.34
TraversableLike
defines the withFilter
method through a member class, WithFilter
, that extends FilterMonadic
:
def withFilter(p: A => Boolean): FilterMonadic[A, Repr] = new WithFilter(p) class WithFilter(p: A => Boolean) extends FilterMonadic[A, Repr] { // implementation of map, flatMap and foreach skipped here def withFilter(q: A => Boolean): WithFilter = new WithFilter(x => p(x) && q(x) ) }
Example 1.35
Please note that withFilter
returns an object of type FilterMonadic
, which only has map
, flatMap
, foreach
, and withFilter
. These are the only methods that can be chained after a call to withFilter
. For example, the following will not compile:
List.tabulate(50)(_ + 1).withFilter(_ % 2 == 0).forall(_ % 2 == 0)
Example 1.36
It is quite common to have a sequence of flatMap
, filter
, and map
chained together and Scala provides syntactic sugar to support that through for comprehensions. To see it in action, let’s consider the following Person
class and its instances:
case class Person(firstName: String, isFemale: Boolean, children: Person*) val bob = Person("Bob", false) val jennette = Person("Jennette", true) val laura = Person("Laura", true) val jean = Person("Jean", true, bob, laura) val persons = List(bob, jennette, laura, jean)
Example 1.37
Person*
represents a variable argument of type Person
. A variable argument of type T
needs to be the last argument in a class definition or method signature and accepts zero, one, or more instances of type T
.
Now say we want to get pairs of mother and child, which would be (Jean
, Bob
) and (Jean
, Laura
). Using flatMap
, filter
, and map
we can write it as follows:
scala> persons.filter(_.isFemale).flatMap(p => p.children.map(c => (p.firstName, c.firstName))) res32: List[(String, String)] = List((Jean,Bob), (Jean,Laura))
Example 1.38
The preceding expression does its job, but it is not quite easy to understand what is happening. This is where for
comprehension comes to the rescue:
scala> for { | p <- persons | if p.isFemale | c <- p.children | } yield (p.firstName, c.firstName) res33: List[(String, String)] = List((Jean,Bob), (Jean,Laura))
Example 1.39
It is much easier to understand what this snippet of code does. Behind the scenes, the Scala compiler will convert this expression into the first one (the only difference being filter
will be replaced with withFilter
).
Scala also provides methods to combine the elements of a collection using the fold
and reduce
families of functions. The primary difference between the two can be understood by comparing the signatures of foldLeft
and reduceLeft
:
def foldLeft[B](z: B)(op: (B, A) ⇒ B): B def reduceLeft[A1 >: A](op: (A1, A1) ⇒ A1): A1
Example 1.40
Both of these methods take a binary operator to combine the elements from left to right. However, foldLeft
takes a zero-argument, z
, of type B
(this value is returned if List
is empty), and the output type can differ from the types of the elements in List
. On the other hand, reduceLeft
requires A1
to be a supertype of A
(>:
signifies a lower bound). So, we can sum up List[Int]
and return the value as Double
using foldLeft
, as follows:
scala> List(1,2,3,4).foldLeft[Double](0) ( _ + _ ) res34: Double = 10.0
Example 1.41
We cannot do the same with reduceLeft
(since Double
is not a supertype of Int
). Trying to do so will raise a compile-time error of type arguments [Double] do not conform to method reduce's type parameter bounds [A1 >:
Int]
:
scala> List(1,2,3,4).reduce[Double] ( _ + _ ) <console>:12: error: type arguments [Double] do not conform to method reduce's type parameter bounds [A1 >: Int] List(1,2,3,4).reduce[Double] ( _ + _ ) ^
Example 1.42
foldRight
and reduceRight
combine the elements of a collection from right to left. There is also fold
and reduce
, and for both, the order in which the elements are combined is unspecified and may be nondeterministic.
In this section, we have seen several examples of HOFs from the Scala collection library. By now, you should have noticed that each of these functions uses type parameters. These are called polymorphic functions, which is what we will cover next.
Understanding polymorphic functions
A function that works with multiple types of input arguments or can return a value of different types is called a polymorphic function. While writing a polymorphic function, we provide a comma-separated list of type parameters surrounded by square brackets after the name of the function. For example, we can write a function that returns the index of the first occurrence of an element within List
:
scala> def findFirstIn[A](as: List[A], p: A => Boolean): Option[Int] = | as.zipWithIndex.collect { case (e, i) if p(e) => i }.headOption findFirstIn: [A](as: List[A], p: A => Boolean)Option[Int] example 1.43
This function will work for any type of list: List[Int]
, List[String]
, and so on. For example, we can search for the index of element 5 in a list of integers from 1 to 20:
scala> import scala.util.Random import scala.util.Random scala> val ints = Random.shuffle((1 to 20).toList) ints: List[Int] = List(7, 9, 3, 8, 6, 13, 12, 18, 14, 15, 1, 11, 10, 16, 2, 5, 20, 17, 4, 19) scala> findFirstIn[Int](ints, _ == 5) res38: Option[Int] = Some(15)
Example 1.44
In the next section, we are going to look at another property of type parameters, called variance, which defines subtyping relationships between objects, as we will see in the following section.
Variance
As mentioned earlier, functions are first-class objects in Scala. Scala automatically converts function literals into objects of the FunctionN type
(N = 0 to 22). For example, consider the following anonymous function:
val f: Int => Any = (x: Int) => x
Example 1.45
This function will be converted automatically to the following:
val f = new Function1[Int, Any] {def apply(x: Int) = x}
Example 1.46
Please note that the preceding syntax represents an object of an anonymous class that extends Function1[Int, Any]
and implements its abstract apply
method. In other words, it is equivalent to the following:
class AnonymousClass extends Function1[Int, Any] { def apply(x: Int): Any = x } val f = new AnonymousClass
Example 1.47
If we refer to the type signature of the Function1
trait, we would see the following:
Function1[-T1, +T2]
Example 1.48
T1
represents the argument type and T2
represents the return type. The type variance of T1
is contravariant and that of T2
is covariant. In general, covariance designed by +
means if a class or trait is covariant in its type parameter T
, that is, C[+T]
, then C[T1]
and C[T2]
will adhere to the subtyping relationship between T1
and T2
. For example, since Any
is a supertype of Int
, C[Any]
will be a supertype of C[Int]
.
The order is reversed for contravariance. So, if we have C[-T]
, then C[Int]
will be a supertype of C[Any]
.
Since we have Function1[-T1, +R]
, that would then mean type Function1[Int, Any]
will be a supertype of, say, Function1[Any, String]
.
To see it in action, let’s define a method that takes a function of type Int => Any
and returns Unit
:
def caller(op: Int => Any): Unit = List .tabulate(5)(i => i + 1) .foreach(i => print(s"$i "))
Example 1.49
Let’s now define two functions:
scala> val f1: Int => Any = (x: Int) => x f1: Int => Any = $Lambda$9151/1234201645@34f561c8 scala> val f2 : Any => String = (x: Any) => x.toString f2: Any => String = $Lambda$9152/1734317897@699fe6f6
Example 1.50
A function (or method) with a parameter of type T
can be invoked with an argument that is either of type T
or its subtype. And since Int => Any
is a supertype of Any => String
, we should be able to pass both of these functions as arguments. As can be seen, both of them indeed work:
scala> caller(f1) 1 2 3 4 5 scala> caller(f2) 1 2 3 4 5
Example 1.51
Option type
Scala’s option type represents optional values. These values can be of two forms: Some(x)
, where x
is the actual value, or None
, which represents a missing value. Many of the Scala collection library methods return a value of the Option[T]
type. The following are a few examples:
scala> List(1, 2, 3, 4).headOption res45: Option[Int] = Some(1) scala> List(1, 2, 3, 4).lastOption res46: Option[Int] = Some(4) scala> List("hello,", "world").find(_ == "world") res47: Option[String] = Some(world) scala> Map(1 -> "a", 2 -> "b").get(3) res48: Option[String] = None
Example 1.52
Option
also has a rich API and provides many of the functions from the collection library API through an implicit conversion function, option2Iterable
, in the companion object. The following are a few examples of methods supported by the Option
type:
scala> Some("hello, world!").headOption res49: Option[String] = Some(hello, world!) scala> None.getOrElse("Empty") res50: String = Empty scala> Some("hello, world!").map(_.replace("!", "..")) res51: Option[String] = Some(hello, world..) scala> Some(List.tabulate(5)(_ + 1)).flatMap(_.headOption) res52: Option[Int] = Some(1)
Example 1.53
Collections
Scala comes with a powerful collection library. Collections are classified into mutable and immutable collections. A mutable collection can be updated in place, whereas an immutable collection never changes. When we add, remove, or update elements of an immutable collection, a new collection is created and returned, keeping the old collection unchanged.
All collection classes are found in the scala.collection
package or one of its subpackages: mutable
, immutable
, and generic
. However, for most of our programming needs, we refer to collections in either the mutable
or immutable
package.
A collection in the scala.collection.immutable
package is guaranteed to be immutable and will never change after it is created. So, we will not have to make any defensive copies of an immutable collection, since accessing a collection multiple times will always yield the same set of elements.
On the other hand, collections in the scala.collection.mutable
package provide methods that can update a collection in place. Since these collections are mutable, we need to defend against any inadvertent update, p, by other parts of the code base.
By default, Scala picks immutable collections. This easy access is provided through the Predef
object, which is implicitly imported into every Scala source file. Refer to the following example:
object Predef { type Set[A] = immutable.Set[A] type Map[A, +B] = immutable.Map[A, B] val Map = immutable.Map val Set = immutable.Set // ... }
Example 1.54
The Traversable
trait is the base trait for all of the collection types. This is followed by Iterable
, which is divided into three subtypes: Seq
, Set
, and Map
. Both Set
and Map
provide sorted and unsorted variants. Seq
, on the other hand, has IndexedSeq
and LinearSeq
. There is quite a bit of similarity among all these classes. For instance, an instance of any collection can be created by the same uniform syntax, writing the collection class name followed by its elements:
Traversable(1, 2, 3) Map("x" -> 24, "y" -> 25, "z" -> 26) Set("red", "green", "blue") SortedSet("hello", "world") IndexedSeq(1.0, 2.0) LinearSeq(a, b, c)
Example 1.55
The following is the hierarchy for scala.collection.immutable
collections taken from the docs.scala-lang.org
website.
Figure 1.1 – Scala collection hierarchy
The Scala collection library is very rich and has various collection types suited to specific programming needs. If you want to delve deep into the Scala collection library, please refer to the Further reading section (the fifth point).
In this section, we looked at the Scala collection hierarchy. In the next section, we will gain a high-level understanding of pattern matching.
Understanding pattern matching
Scala has excellent support for pattern matching. The most prominent use is the match
expression, which takes the following form:
selector match { alternatives }
selector
is the expression that the alternatives will be tried against. Each alternative starts with the case
keyword and includes a pattern, an arrow symbol =>
, and one or more expressions, which will be evaluated if the pattern matches. The patterns can be of various types, such as the following:
- Wildcard patterns
- Constant patterns
- Variable patterns
- Constructor patterns
- Sequence patterns
- Tuple patterns
- Typed patterns
Before going through each of these pattern types, let’s define our own custom List
:
trait List[+A] case class Cons[+A](head: A, tail: List[A]) extends List[A] case object Nil extends List[Nothing] object List { def apply[A](as: A*): List[A] = if (as.isEmpty) Nil else Cons(as.head, apply(as.tail: _*)) }
Example 1.56
Wildcard patterns
The wildcard pattern (_
) matches any object and is used as a default, catch-all alternative. Consider the following example:
scala> def emptyList[A](l: List[A]): Boolean = l match { | case Nil => true | case _ => false | } emptyList: [A](l: List[A])Boolean scala> emptyList(List(1, 2)) res8: Boolean = false
Example 1.57
A wildcard can also be used to ignore parts of an object that we do not care about. Refer to the following code:
scala> def threeElements[A](l: List[A]): Boolean = l match { | case Cons(_, Cons(_, Cons(_, Nil))) => true | case _ => false | } threeElements: [A](l: List[A])Boolean scala> threeElements(List(true, false)) res11: Boolean = false scala> threeElements(Nil) res12: Boolean = false scala> threeElements(List(1, 2, 3)) res13: Boolean = true scala> threeElements(List("a", "b", "c", "d")) res14: Boolean = false
Example 1.58
In the preceding example, the threeElements
method checks whether a given list has exactly three elements. The values themselves are not needed and are thus discarded in the pattern match.
Constant patterns
A constant pattern matches only itself. Any literal can be used as a constant – 1
, true,
and hi
are all constant patterns. Any val
or singleton object can also be used as a constant. The emptyList
method from the previous example uses Nil
to check whether the list is empty.
Variable patterns
Like a wildcard, a variable pattern matches any object and is bound to it. We can then use this variable to refer to the object:
scala> val ints = List(1, 2, 3, 4) ints: List[Int] = Cons(1,Cons(2,Cons(3,Cons(4,Nil)))) scala> ints match { | case Cons(_, Cons(_, Cons(_, Nil))) => println("A three element list") | case l => println(s"$l is not a three element list") | } Cons(1,Cons(2,Cons(3,Cons(4,Nil)))) is not a three element list
Example 1.59
In the preceding example, l
is bound to the entire list, which then is printed to the console.
Constructor patterns
A constructor pattern looks like Cons(_, Cons(_, Cons(_, Nil)))
. It consists of the name of a case class (Cons
), followed by a number of patterns in parentheses. These extra patterns can themselves be constructor patterns, and we can use them to check arbitrarily deep into an object. In this case, checks are performed at four levels.
Sequence patterns
Scala allows us to match against sequence types such as Seq
, List
, and Array
among others. It looks similar to a constructor pattern. Refer to the following:
scala> def thirdElement[A](s: Seq[A]): Option[A] = s match { | case Seq(_, _, a, _*) => Some(a) | case _ => None | } thirdElement: [A](s: Seq[A])Option[A] scala> val intSeq = Seq(1, 2, 3, 4) intSeq: Seq[Int] = List(1, 2, 3, 4) scala> thirdElement(intSeq) res16: Option[Int] = Some(3) scala> thirdElement(Seq.empty[String]) res17: Option[String] = None
Example 1.60
As the example illustrates, thirdElement
returns a value of type Option[A]
. If a sequence has three or more elements, it will return the third element, whereas for any sequence with less than three elements, it will return None
. Seq(_, _, a, _*)
binds a to the third element if present. The _*
pattern matches any number of elements.
Tuple patterns
We can pattern match against tuples too:
scala> val tuple3 = (1, 2, 3) tuple3: (Int, Int, Int) = (1,2,3) scala> def printTuple(a: Any): Unit = a match { | case (a, b, c) => println(s"Tuple has $a, $b, $c") | case _ => | } printTuple: (a: Any)Unit scala> printTuple(tuple3) Tuple has 1, 2, 3
Example 1.61
Running the preceding program will print Tuple has 1, 2, 3
to the console.
Typed patterns
A typed pattern allows us to check types in the pattern match and can be used for type tests and type casts:
scala> def getLength(a: Any): Int = | a match { | case s: String => s.length | case l: List[_] => l.length //this is List from Scala collection library | case m: Map[_, _] => m.size | case _ => -1 | } getLength: (a: Any)Int scala> getLength("hello, world") res3: Int = 12 scala> getLength(List(1, 2, 3, 4)) res4: Int = 4 scala> getLength(Map.empty[Int, String]) res5: Int = 0
Example 1.62
Please note that the argument a
of type Any
does not support methods such as length
or size
in the result expression. Scala automatically applies a type test and a type cast to match the target type. For example, case s: String => s.length
is equivalent to the following snippet:
if (s.isInstanceOf[String]) { val x = s.asInstanceOf[String] x.length }
Example 1.63
One important thing to note, though, is that Scala does not maintain type arguments during runtime. So, there is no way to check whether list
has all integer elements or not. For example, the following will print A list of String
to the console. The compiler will emit a warning to alert about the runtime behavior. Arrays are the only exception because the element type is stored with the array value:
scala> List.fill(5)(0) match { | case _: List[String] => println("A list of String") | case _ => | } <console>:13: warning: fruitless type test: a value of type List[Int] cannot also be a List[String] (the underlying of List[String]) (but still might match its erasure) case _: List[String] => println("A list of String") ^ A list of String
Example 1.64
Implicits in Scala
Scala provides implicit conversions and parameters. Implicit conversion to an expected type is the first place the compiler uses implicits. For example, the following works:
scala> val d: Double = 2 d: Double = 2.0
Example 1.65
This works because of the following implicit method definition in the Int
companion object (it was part of Predef
prior to 2.10.x):
implicit def int2double(x: Int): Double = x.toDouble
Example 1.66
Another application of implicit conversion is the receiver of a method call. For example, let’s define a Rational
class:
scala> class Rational(n: Int, d: Int) extends Ordered[Rational] { | | require(d != 0) | private val g = gcd(n.abs, d.abs) | private def gcd(a: Int, b: Int): Int = if (b == 0) a else gcd(b, a % b) | val numer = n / g | val denom = d / g | def this(n: Int) = this(n, 1) | def +(that: Rational) = new Rational( | this.numer * that.numer + this.denom * that.denom, | this.denom * that.denom | ) | def compare(that: Rational) = (this.numer * that.numer - this.denom * that.denom) | override def toString = if (denom == 1) numer.toString else s"$numer/$denom" | } defined class Rational
Example 1.67
Then declare a variable of the Rational
type:
scala> val r1 = new Rational(1) r1: Rational = 1 scala> 1 + r1 <console>:14: error: overloaded method value + with alternatives: (x: Double)Double <and> (x: Float)Float <and> (x: Long)Long <and> (x: Int)Int <and> (x: Char)Int <and> (x: Short)Int <and> (x: Byte)Int <and> (x: String)String cannot be applied to (Rational) 1 + r1 ^
Example 1.68
If we try to add r1
to 1
, we will get a compile-time error. The reason is the +
method in Int
does not support an argument of type Rational
. In order to make it work, we can create an implicit conversion from Int
to Rational
:
scala> implicit def intToRational(n: Int): Rational = new Rational(n) intToRational: (n: Int)Rational scala> val r1 = new Rational(1) r1: Rational = 1 scala> 1 + r1 res11: Rational = 2
Example 1.69
Summary
This was a long chapter and we covered a lot of topics. We started this chapter with a brief introduction to functional programming, looked at why it is useful, and reviewed examples of RT. We then looked at various language features and constructs, starting with classes, objects, and traits. We looked at HOFs, which are one of the fundamental building blocks of functional programming. We looked at polymorphic functions and saw how they enable us to write reusable code. Then, we looked at variance, which defines subtyping relationships between objects, took a detailed tour of pattern matching, and finally, ended with implicit conversion, which is a powerful language feature used in design patterns such as type classes.
In the next chapter, we are going to focus on setting up the environment, which will allow you to follow along with the rest of the chapters.
Further reading
- Programming in Scala, Fourth Edition, Martin Odersky, Lex Spoon, and Bill Venners
- If you are interested in learning more about functional programming, please refer to Functional Programming in Scala by Paul Chiusano and Rúnar Bjarnaso
- A nice explanation of how app traits actually work
- https://stackoverflow.com/questions/53468358/how-Scala-app-trait-and-main-works-internally
- Scala’s stackable trait pattern by Bill Venners:
https://www.artima.com/articles/Scalas-stackable-trait-pattern
- Please refer to the Scala docs for more details on the collection library: https://docs.scala-lang.org/overviews/collections/overview.html