Scala and Spark for Big Data Analytics

Introduction to Scala

"I'm Scala. I'm a scalable, functional and object-oriented programming language. I can grow with you and you can play with me by typing one-line expressions and observing the results instantly"

- Scala Quote

In last few years, Scala has observed steady rise and wide adoption by developers and practitioners, especially in the fields of data science and analytics. On the other hand, Apache Spark which is written in Scala is a fast and general engine for large-scale data processing. Spark's success is due to many factors: easy-to-use API, clean programming model, performance, and so on. Therefore, naturally, Spark has more support for Scala: more APIs are available for Scala compared to Python or Java; although, new Scala APIs are available before those for Java, Python, and R.

Now that before we start writing your data analytics program using Spark and Scala (part II), we will first get familiar with Scala's functional programming concepts, object oriented features and the Scala collection APIs in detail (part I). As a starting point, we will provide a brief introduction to Scala in this chapter. We will cover some basic aspects of Scala including it's history and purposes. Then we will see how to install Scala on different platforms including Windows, Linux, and Mac OS so that your data analytics programs can be written on your favourite editors and IDEs. Later in this chapter, we will provide a comparative analysis between Java and Scala. Finally, we will dive into Scala programming with some examples.

In a nutshell, the following topics will be covered:

History and purposes of Scala
Platforms and editors
Installing and setting up Scala
Scala: the scalable language
Scala for Java programmers
Scala for the beginners
Summary

History and purposes of Scala

Scala is a general-purpose programming language that comes with support of functional programming and a strong static type system. The source code of Scala is intended to be compiled into Java bytecode, so that the resulting executable code can be run on Java virtual machine (JVM).

Martin Odersky started the design of Scala back in 2001 at the École Polytechnique Fédérale de Lausanne (EPFL). It was an extension of his work on Funnel, which is a programming language that uses functional programming and Petri nets. The first public release appears in 2004 but with only on the Java platform support. Later on, it was followed by .NET framework in June 2004.

Scala has become very popular and experienced wide adoptions because it not only supports the object-oriented programming paradigm, but it also embraces the functional programming concepts. In addition, although Scala's symbolic operators are hardly easy to read, compared to Java, most of the Scala codes are comparatively concise and easy to read -e.g. Java is too verbose.

Like any other programming languages, Scala was prosed and developed for specific purposes. Now, the question is, why was Scala created and what problems does it solve? To answer these questions, Odersky said in his blog:

"The work on Scala stems from a research effort to develop better language support for component software. There are two hypotheses that we would like to validate with the Scala experiment. First, we postulate that a programming language for component software needs to be scalable in the sense that the same concepts can describe small as well as large parts. Therefore, we concentrate on mechanisms for abstraction, composition, and decomposition, rather than adding a large set of primitives, which might be useful for components at some level of scale but not at other levels. Second, we postulate that scalable support for components can be provided by a programming language which unifies and generalizes object-oriented and functional programming. For statically typed languages, of which Scala is an instance, these two paradigms were up to now largely separate."

Nevertheless, pattern matching and higher order functions, and so on, are also provided in Scala, not to fill the gap between FP and OOP, but because they are typical features of functional programming. For this, it has some incredibly powerful pattern-matching features, which are an actor-based concurrency framework. Moreover, it has the support of the first- and higher-order functions. In summary, the name "Scala" is a portmanteau of scalable language, signifying that it is designed to grow with the demands of its users.

Platforms and editors

Scala runs on Java Virtual Machine (JVM), which makes Scala a good choice for Java programmers too who would like to have a functional programming flavor in their codes. There are lots of options when it comes to editors. It's better for you to spend some time making some sort of a comparative study between the available editors because being comfortable with an IDE is one of the key factors for a successful programming experience. Following are some options to choose from:

Scala IDE
Scala plugin for Eclipse
IntelliJ IDEA
Emacs
VIM

Scala support programming on Eclipse has several advantages using numerous beta plugins. Eclipse provides some exciting features such as local, remote, and high-level debugging facilities with semantic highlighting and code completion for Scala. You can use Eclipse for Java as well as Scala application development with equal ease. However, I would also suggest Scala IDE (http://scala-ide.org/)--it's a full-fledged Scala editor based on Eclipse and customized with a set of interesting features (for example, Scala worksheets, ScalaTest support, Scala refactoring, and so on).

The second best option, in my view, is the IntelliJ IDEA. The first release came in 2001 as the first available Java IDEs with advanced code navigation and refactoring capabilities integrated. According to the InfoWorld report (see at http://www.infoworld.com/article/2683534/development-environments/infoworld-review--top-java-programming-tools.html), out of the four top Java programming IDE (that is, Eclipse, IntelliJ IDEA, NetBeans, and JDeveloper), IntelliJ received the highest test center score of 8.5 out of 10.

The corresponding scoring is shown in the following figure:

Figure 1: Best IDEs for Scala/Java developers

From the preceding figure, you may be interested in using other IDEs such as NetBeans and JDeveloper too. Ultimately, the choice is an everlasting debate among the developers, which means the final choice is yours.

Installing and setting up Scala

As we have already mentioned, Scala uses JVM, therefore make sure you have Java installed on your machine. If not, refer to the next subsection, which shows how to install Java on Ubuntu. In this section, at first, we will show you how to install Java 8 on Ubuntu. Then, we will see how to install Scala on Windows, Mac OS, and Linux.

Installing Java

For simplicity, we will show how to install Java 8 on an Ubuntu 14.04 LTS 64-bit machine. But for Windows and Mac OS, it would be better to invest some time on Google to know how. For a minimum clue for the Windows users: refer to this link for details https://java.com/en/download/help/windows_manual_download.xml .

Now, let's see how to install Java 8 on Ubuntu with step-by-step commands and instructions. At first, check whether Java is already installed:

$ java -version

If it returns The program java cannot be found in the following packages, Java hasn't been installed yet. Then you would like to execute the following command to get rid of:

 $ sudo apt-get install default-jre

This will install the Java Runtime Environment (JRE). However, if you may instead need the Java Development Kit (JDK), which is usually needed to compile Java applications on Apache Ant, Apache Maven, Eclipse, and IntelliJ IDEA.

The Oracle JDK is the official JDK, however, it is no longer provided by Oracle as a default installation for Ubuntu. You can still install it using apt-get. To install any version, first execute the following commands:

$ sudo apt-get install python-software-properties
$ sudo apt-get update
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update

Then, depending on the version you want to install, execute one of the following commands:

$ sudo apt-get install oracle-java8-installer

After installing, don't forget to set the Java home environmental variable. Just apply the following commands (for the simplicity, we assume that Java is installed at /usr/lib/jvm/java-8-oracle):

$ echo "export JAVA_HOME=/usr/lib/jvm/java-8-oracle" >> ~/.bashrc  
$ echo "export PATH=$PATH:$JAVA_HOME/bin" >> ~/.bashrc
$ source ~/.bashrc

Now, let's see the Java_HOME as follows:

$ echo $JAVA_HOME

You should observe the following result on Terminal:

 /usr/lib/jvm/java-8-oracle

Now, let's check to make sure that Java has been installed successfully by issuing the following command (you might see the latest version!):

$ java -version

You will get the following output:

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Excellent! Now you have Java installed on your machine, thus you're ready Scala codes once it is installed. Let's do this in the next few subsections.

Windows

This part will focus on installing Scala on the PC with Windows 7, but in the end, it won't matter which version of Windows you to run at the moment:

The first step is to download a zipped file of Scala from the official site. You will find it at https://www.Scala-lang.org/download/all.html. Under the other resources section of this page, you will find a list of the archive files from which you can install Scala. We will choose to download the zipped file for Scala 2.11.8, as shown in the following figure:

Figure 2: Scala installer for Windows

After the downloading has finished, unzip the file and place it in your favorite folder. You can also rename the file Scala for navigation flexibility. Finally, a PATH variable needs to be created for Scala to be globally seen on your OS. For this, navigate to Computer | Properties, as shown in the following figure:

Figure 3: Environmental variable tab on windows

Select Environment Variables from there and get the location of the bin folder of Scala; then, append it to the PATH environment variable. Apply the changes and then press OK, as shown in the following screenshot:

Figure 4: Adding environmental variables for Scala

Now, you are ready to go for the Windows installation. Open the CMD and just type scala. If you were successful in the installation process, then you should see an output similar to the following screenshot:

Figure 5: Accessing Scala from "Scala shell"

Mac OS

It's time now to install Scala on your Mac. There are lots of ways in which you can install Scala on your Mac, and here, we are going to mention two of them:

Using Homebrew installer

At first, check your system to see whether it has Xcode installed or not because it's required in this step. You can install it from the Apple App Store free of charge.
Next, you need to install Homebrew from the terminal by running the following command in your terminal:

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Note: The preceding command is changed by the Homebrew guys from time to time. If the command doesn't seem to be working, check the Homebrew website for the latest incantation: http://brew.sh/.

Now, you are ready to go and install Scala by typing this command brew install scala in the terminal.
Finally, you are ready to go by simply typing Scala in your terminal (the second line) and you will observe the following on your terminal:

Figure 6: Scala shell on macOS

Installing manually

Before installing Scala manually, choose your preferred version of Scala and download the corresponding .tgz file of that version Scala-verion.tgz from http://www.Scala-lang.org/download/. After downloading your preferred version of Scala, extract it as follows:

$ tar xvf scala-2.11.8.tgz

Then, move it to /usr/local/share as follows:

$ sudo mv scala-2.11.8 /usr/local/share

Now, to make the installation permanent, execute the following commands:

$ echo "export SCALA_HOME=/usr/local/share/scala-2.11.8" >> ~/.bash_profile
$ echo "export PATH=$PATH: $SCALA_HOME/bin" >> ~/.bash_profile

That's it. Now, let's see how it can be done on Linux distributions like Ubuntu in the next subsection.

Linux

In this subsection, we will show you the installation procedure of Scala on the Ubuntu distribution of Linux. Before starting, let's check to make sure Scala is installed properly. Checking this is straightforward using the following command:

$ scala -version

If Scala is already installed on your system, you should get the following message on your terminal:

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

Note that, during the writing of this installation, we used the latest version of Scala, that is, 2.11.8. If you do not have Scala installed on your system, make sure you install it before proceeding to the next step. You can download the latest version of Scala from the Scala website at http://www.scala-lang.org/download/ (for a clearer view, refer to Figure 2). For ease, let's download Scala 2.11.8, as follows:

$ cd Downloads/
$ wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz

After the download has been finished, you should find the Scala tar file in the download folder.

The user should first go into the Download directory with the following command: $ cd /Downloads/. Note that the name of the downloads folder may change depending on the system's selected language.

To extract the Scala tar file from its location or more, type the following command. Using this, the Scala tar file can be extracted from the Terminal:

$ tar -xvzf scala-2.11.8.tgz

Now, move the Scala distribution to the user's perspective (for example, /usr/local/scala/share) by typing the following command or doing it manually:

 $ sudo mv scala-2.11.8 /usr/local/share/

Move to your home directory issue using the following command:

$ cd ~

Then, set the Scala home using the following commands:

$ echo "export SCALA_HOME=/usr/local/share/scala-2.11.8" >> ~/.bashrc       
$ echo "export PATH=$PATH:$SCALA_HOME/bin" >> ~/.bashrc

Then, make the change permanent for the session by using the following command:

$ source ~/.bashrc

After the installation has been completed, you should better to verify it using the following command:

$ scala -version

If Scala has successfully been configured on your system, you should get the following message on your terminal:

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

Well done! Now, let's enter into the Scala shell by typing the scala command on the terminal, as shown in the following figure:

Figure 7: Scala shell on Linux (Ubuntu distribution)

Finally, you can also install Scala using the apt-get command, as follows:

$ sudo apt-get install scala

This command will download the latest version of Scala (that is, 2.12.x). However, Spark does not have support for Scala 2.12 yet (at least when we wrote this chapter). Therefore, we would recommend the manual installation described earlier.

Scala: the scalable language

The name Scala comes from a scalable language because Scala's concepts scale well to large programs. Some programs in other languages will take tens of lines to be coded, but in Scala, you will get the power to express the general patterns and concepts of programming in a concise and effective manner. In this section, we will describe some exciting features of Scala that Odersky has created for us:

Scala is object-oriented

Scala is a very good example of an object-oriented language. To define a type or behavior for your objects you need to use the notion of classes and traits, which will be explained later, in the next chapter. Scala doesn't support direct multiple inheritances, but to achieve this structure, you need to use Scala's extension of the subclassing and mixing-based composition. This will be discussed in later chapters.

Scala is functional

Functional programming treats functions like first-class citizens. In Scala, this is achieved with syntactic sugar and objects that extend traits (like Function2), but this is how functional programming is achieved in Scala. Also, Scala defines a simple and easy way to define anonymous functions (functions without names). It also supports higher-order functions and it allows nested functions. The syntax of these concepts will be explained in deeper details in the coming chapters.

Also, it helps you to code in an immutable way, and by this, you can easily apply it to parallelism with synchronization and concurrency.

Scala is statically typed

Unlike the other statically typed languages like Pascal, Rust, and so on, Scala does not expect you to provide redundant type information. You don't have to specify the type in most cases. Most importantly, you don't even need to repeat them again.

A programming language is called statically typed if the type of a variable is known at compile time: this also means that, as a programmer, you must specify what the type of each variable is. For example, Scala, Java, C, OCaml, Haskell, and C++, and so on. On the other hand, Perl, Ruby, Python, and so on are dynamically typed languages, where the type is not associated with the variables or fields, but with the runtime values.

The statically typed nature of Scala ensures that all kinds of checking are done by the compiler. This extremely powerful feature of Scala helps you find/catch most trivial bugs and errors at a very early stage, before being executed.

Scala runs on the JVM

Just like Java, Scala is also compiled into bytecode which can easily be executed by the JVM. This means that the runtime platforms of Scala and Java are the same because both generate bytecodes as the compilation output. So, you can easily switch from Java to Scala, you can and also easily integrate both, or even use Scala in your Android application to add a functional flavor.

Note that, while using Java code in a Scala program is quite easy, the opposite is very difficult, mostly because of Scala's syntactic sugar.

Also, just like the javac command, which compiles Java code into bytecode, Scala has the scalas command, which compiles the Scala code into bytecode.

Scala can execute Java code

As mentioned earlier, Scala can also be used to execute your Java code. Not just installing your Java code; it also enables you to use all the available classes from the Java SDK, and even your own predefined classes, projects, and packages right in the Scala environment.

Scala can do concurrent and synchronized processing

Some programs in other languages will take tens of lines to be coded, but in Scala, you will get the power to express the general patterns and concepts of programming in a concise and effective manner. Also, it helps you to code in an immutable way, and by this, you can easily apply it to parallelism with synchronization and concurrency.

Scala for Java programmers

Scala has a set of features that completely differ from Java. In this section, we will discuss some of these features. This section will be helpful for those who are from a Java background or are at least familiar with basic Java syntax and semantics.

All types are objects

As mentioned earlier, every value in Scala will look like an object. This statement means everything looks like an object, but some of them do not actually object and you will see the interpretation of this in the coming chapters (for example, the difference between the reference types and the primitive types still exists in Scala, but it hides it for the most part). For example, in Scala, strings are implicitly converted to collections of characters, but not in Java!

Type inference

If you are not familiar with the term, it is nothing but the deduction of types at compile time. Hold on, isn't that what dynamic typing means? Well, no. Notice that I said deduction of types; this is drastically different from what dynamically typed languages do, and another thing is, it is done at compile time and not runtime. Many languages have this built in, but the implementation varies from one language to another. This might be confusing at the beginning, but it will become clearer with code examples. Let's jump into the Scala REPL for some experimentation.

Scala REPL

The Scala REPL is a powerful feature that makes it more straightforward and concise to write Scala code on the Scala shell. REPL stands for Read-Eval-Print-Loop also called the Interactive Interpreter. This means it is a program for:

Reading the expressions you type in.
Evaluating the expression in step 1 using the Scala compiler.
Printing out the result of the evaluation in step 2.
Waiting (looping) for you to enter further expressions.

Figure 8: Scala REPL example 1

From the figure, it is evident that there is no magic, the variables are inferred automatically to the best types they deem fit at compile time. If you look even more carefully, when I tried to declare:

 i:Int = "hello"

Then, the Scala shell throws an error saying the following:

<console>:11: error: type mismatch;
  found   : String("hello")
  required: Int
        val i:Int = "hello"
                    ^

According to Odersky, "Mapping a character to the character map over a RichString should again yield a RichString, as in the following interaction with the Scala REP". The preceding statement can be proved using the following line of code:

scala> "abc" map (x => (x + 1).toChar) 
res0: String = bcd

However, if someone applies a method from Char to Int to a String, then what happens? In that case, Scala converts them, as a vector of integer also called immutable is a feature of Scala collection, as shown in Figure 9. We will look at the details on Scala collection API in Chapter 4, Collections APIs.

"abc" map (x => (x + 1)) 
res1: scala.collection.immutable.IndexedSeq[Int] = Vector(98, 99, 100)

Both static and instance methods of objects are also available. For example, if you declare x as a string hello and then try to access both the static and instance methods of objects x, they are available. In the Scala shell, type x then . and <tab> and then you will find the available methods:

scala> val x = "hello"
x: java.lang.String = hello
scala> x.re<tab>
reduce             reduceRight         replaceAll            reverse
reduceLeft         reduceRightOption   replaceAllLiterally   reverseIterator
reduceLeftOption   regionMatches       replaceFirst          reverseMap
reduceOption       replace             repr
scala>

Since this is all accomplished on the fly via reflection, even anonymous classes you've only just defined are equally accessible:

scala> val x = new AnyRef{def helloWord = "Hello, world!"}
x: AnyRef{def helloWord: String} = $anon$1@58065f0c
 scala> x.helloWord
    def helloWord: String
 scala> x.helloWord
 warning: there was one feature warning; re-run with -feature for details
 res0: String = Hello, world!

The preceding two examples can be shown on the Scala shell, as follows:

Figure 9: Scala REPL example 2

"So it turns out that map yields different types depending on what the result type of the passed function argument is!"

- Odersky

Nested functions

Why will you require a nested functions support in your programming language? Most of the time, we want to maintain our methods to be a few lines and avoid overly large functions. A typical solution for this in Java would be to define all these small functions on a class level, but any other method could easily refer and access them even though they are helper methods. The situation is different in Scala, so you can use define functions inside each other, and this way, prevent any external access to these functions:

def sum(vector: List[Int]): Int = {
  // Nested helper method (won't be accessed from outside this function
  def helper(acc: Int, remaining: List[Int]): Int = remaining match {
    case Nil => acc
    case _   => helper(acc + remaining.head, remaining.tail)
  }
  // Call the nested method
  helper(0, vector)
}

We are not expecting you to understand these code snippets, which show the difference between Scala and Java.

Import statements

In Java, you can only import packages at the top of your code file, right after the packages statement. The situation is not the same in Scala; you can write your import statements almost anywhere inside your source file (for example, you can even write your import statements inside a class or a method). You just need to pay attention to the scope of your import statement, because it inherits the same scope of the members of your class or local variables inside your method. The _ (underscore) in Scala is used for wildcard imports, which is similar to the * (asterisk) that you would use in java:

// Import everything from the package math 
import math._

You may also use these { } to indicate a set of imports from the same parent package, just in one line of code. In Java, you would use multiple lines of code to do so:

// Import math.sin and math.cos
import math.{sin, cos}

Unlike the Java, Scala does not have the concept of static imports. In other words, the concept of static doesn't exist in Scala. However, as a developer, obviously, you can import a member or more than one member of an object using a regular import statement. The preceding example already shows this, where we import the methods sin and cos from the package object named math. To demonstrate an example, the preceding code snippet can be defined from the Java programmer's perspective as follows:

import static java.lang.Math.sin;
import static java.lang.Math.cos;

Another beauty of Scala is that, in Scala, you can rename your imported packages as well. Alternatively, you can rename your imported packages to avoid the type conflicting with packages that have similar members. The following statement is valid in Scala:

// Import Scala.collection.mutable.Map as MutableMap 
import Scala.collection.mutable.{Map => MutableMap}

Finally, you may want to exclude a member of packages for collisions or other purposes. For this, you can use a wildcard to do so:

// Import everything from math, but hide cos 
import math.{cos => _, _}

Operators as methods

It's worth mentioning that Scala doesn't support the operator overloading. You might think that there are no operators at all in Scala.

An alternative syntax for calling a method taking a single parameter is the use of the infix syntax. The infix syntax provides you with a flavor just like you are applying an operator overloading, as like what you did in C++. For example:

val x = 45
val y = 75

In the following case, the + means a method in class Int. The following code is a non-conventional method calling syntax:

val add1 = x.+(y)

More formally, the same can be done using the infix syntax, as follows:

val add2 = x + y

Moreover, you can utilize the infix syntax. However, the method has only a single parameter, as follows:

val my_result = List(3, 6, 15, 34, 76) contains 5

There's one special case when using the infix syntax. That is, if the method name ends with a : (colon), then the invocation or call will be right associative. This means that the method is called on the right argument with the expression on the left as the argument, instead of the other way around. For example, the following is valid in Scala:

val my_list = List(3, 6, 15, 34, 76)

The preceding statement signifies that: my_list.+:(5) rather than 5.+:(my_list) and more formally:

val my_result = 5 +: my_list

Now, let's look at the preceding examples on Scala REPL:

scala> val my_list = 5 +: List(3, 6, 15, 34, 76)
  my_list: List[Int] = List(5, 3, 6, 15, 34, 76)
scala> val my_result2 = 5+:my_list
  my_result2: List[Int] = List(5, 5, 3, 6, 15, 34, 76)
scala> println(my_result2)
  List(5, 5, 3, 6, 15, 34, 76)
scala>

In addition to the above, operators here are just methods, so that they can simply be overridden just like methods.

Methods and parameter lists

In Scala, a method can have multiple parameter lists or even no parameter list at all. On the other hand, in Java, a method always has one parameter list, with zero or more parameters. For example, in Scala, the following is the valid method definition (written in currie notation) where a method has two parameter lists:

def sum(x: Int)(y: Int) = x + y

The preceding method cannot be written as:

def sum(x: Int, y: Int) = x + y

A method, let's say sum2, can have no parameter list at all, as follows:

def sum2 = sum(2) _

Now, you can call the method add2, which returns a function taking one parameter. Then, it calls that function with the argument 5, as follows:

val result = add2(5)

Methods inside methods

Sometimes, you would like to make your applications, code modular by avoiding too long and complex methods. Scala provides you this facility to avoid your methods becoming overly large so that you can split them up into several smaller methods.

On the other hand, Java allows you only to have the methods defined at class level. For example, suppose you have the following method definition:

def main_method(xs: List[Int]): Int = {
  // This is the nested helper/auxiliary method
  def auxiliary_method(accu: Int, rest: List[Int]): Int = rest match {
    case Nil => accu
    case _   => auxiliary_method(accu + rest.head, rest.tail)
  }
}

Now, you can call the nested helper/auxiliary method as follows:

auxiliary_method(0, xs)

Considering the above, here's the complete code segment which is valid:

def main_method(xs: List[Int]): Int = {
  // This is the nested helper/auxiliary method
  def auxiliary_method(accu: Int, rest: List[Int]): Int = rest match {
    case Nil => accu
    case _   => auxiliary_method(accu + rest.head, rest.tail)
  }
   auxiliary_method(0, xs)
}

Constructor in Scala

One surprising thing about Scala is that the body of a Scala class is itself a constructor. However, Scala does so; in fact, in a more explicit way. After that, a new instance of that class is created and executed. Moreover, you can specify the arguments of the constructor in the class declaration line.

Consequently, the constructor arguments are accessible from all of the methods defined in that class. For example, the following class and constructor definition is valid in Scala:

class Hello(name: String) {
  // Statement executed as part of the constructor
  println("New instance with name: " + name)
  // Method which accesses the constructor argument
  def sayHello = println("Hello, " + name + "!")
}

The equivalent Java class would look like this:

public class Hello {
  private final String name;
  public Hello(String name) {
    System.out.println("New instance with name: " + name);
    this.name = name;
  }
  public void sayHello() {
    System.out.println("Hello, " + name + "!");
  }
}

Objects instead of static methods

As mentioned earlier, static does not exist in Scala. You cannot do static imports and neither can you cannot add static methods to classes. In Scala, when you define an object with the same name as the class and in the same source file, then the object is said to be the companion of that class. Functions that you define in this companion object of a class are like static methods of a class in Java:

class HelloCity(CityName: String) {
  def sayHelloToCity = println("Hello, " + CityName + "!") 
}

This is how you can define a companion object for the class hello:

object HelloCity { 
  // Factory method 
  def apply(CityName: String) = new Hello(CityName) 
}

The equivalent class in Java would look like this:

public class HelloCity { 
  private final String CityName; 
  public HelloCity(String CityName) { 
    this.CityName = CityName; 
  }
  public void sayHello() {
    System.out.println("Hello, " + CityName + "!"); 
  }
  public static HelloCity apply(String CityName) { 
    return new Hello(CityName); 
  } 
}

So, lot's of verbose in this simple class, isn't there? The apply method in Scala is treated in a different way, such that you can find a special shortcut syntax to call it. This is the familiar way of calling the method:

val hello1 = Hello.apply("Dublin")

Here's the shortcut syntax that is equivalent to the one earlier:

 val hello2 = Hello("Dublin")

Note that this only works if you used the apply method in your code because Scala treats methods that are named apply in this different way.

Traits

Scala provides a great functionality for you in order to extend and enrich your classes' behaviors. These traits are similar to the interface in which you define the function prototypes or signatures. So, with this, you can have mix-ins of functionality coming from different traits and, in this way, you enriched your classes' behavior. So, what's so good about traits in Scala? They enable the composition of classes from these traits, with traits being the building blocks. As always, let's look at in an example. This is how a conventional logging routine is set up in Java:

Note that, even though you can mix in any number of traits you want. Moreover, like Java, Scala does not have the support of multiple inheritances. However, in both Java and Scala, a subclass can only extend a single superclass. For example, in Java:

class SomeClass {
  //First, to have to log for a class, you must initialize it
  final static Logger log = LoggerFactory.getLogger(this.getClass());
  ...
  //For logging to be efficient, you must always check, if logging level for current message is enabled                
  //BAD, you will waste execution time if the log level is an error, fatal, etc.
  log.debug("Some debug message");
  ...
  //GOOD, it saves execution time for something more useful
  if (log.isDebugEnabled()) { log.debug("Some debug message"); }
  //BUT looks clunky, and it's tiresome to write this construct every time you want to log something.
}

For a more detailed discussion, refer to this URL https://stackoverflow.com/questions/963492/in-log4j-does-checking-isdebugenabled-before-logging-improve-performance/963681#963681.

However, it's different with traits. It's very tiresome to always check for the log level being enabled. It would be good, if you could write this routine once and reuse it anywhere, in any class right away. Traits in Scala make this all possible. For example:

trait Logging {
  lazy val log = LoggerFactory.getLogger(this.getClass.getName)     
  //Let's start with info level...
  ...
  //Debug level here...
  def debug() {
    if (log.isDebugEnabled) log.info(s"${msg}")
  }
  def debug(msg: => Any, throwable: => Throwable) {
    if (log.isDebugEnabled) log.info(s"${msg}", throwable)
  }
  ...
  //Repeat it for all log levels you want to use
}

If you look at the preceding code, you will see an example of using string starting with s. This way, Scala offers the mechanism to create strings from your data called String Interpolation.

String Interpolation, allows you to embed variable references directly in processed string literals. For example:
scala> val name = "John Breslin"
scala> println(s"Hello, $name") // Hello, John Breslin.

Now, we can get an efficient logging routine in a more conventional style as a reusable block. To enable logging for any class, we just mix in our Logging trait! Fantastic! Now that's all it takes to add a logging feature to your class:

class SomeClass extends Logging {
  ...
  //With logging trait, no need for declaring a logger manually for every class
  //And now, your logging routine is either efficient and doesn't litter the code!

  log.debug("Some debug message")
  ...
}

It is even possible to mix-up multiple traits. For example, for the preceding trait (that is, Logging) you can keep extending in the following order:

trait Logging  {
  override def toString = "Logging "
}
class A extends Logging  {
  override def toString = "A->" + super.toString
}
trait B extends Logging  {
  override def toString = "B->" + super.toString
}
trait C extends Logging  {
  override def toString = "C->" + super.toString
}
class D extends A with B with C {
  override def toString = "D->" + super.toString
}

However, it is noted that a Scala class can extend multiple traits at once, but JVM classes can extend only one parent class.

Now, to invoke the above traits and classes, use new D() from Scala REPL, as shown in the following figure:

Figure 10: Mixing multiple traits

Everything has gone smoothly so far in this chapter. Now, let's move to a new section where we will discuss some topics for the beginner who wants to drive themselves into the realm of Scala programming.

Scala for the beginners

In this part, you will find that we assume that you have a basic understanding of any previous programming language. If Scala is your first entry into the coding world, then you will find a large set of materials and even courses online that explain Scala for beginners. As mentioned, there are lots of tutorials, videos, and courses out there.

There is a whole Specialization, which contains this course, on Coursera: https://www.coursera.org/specializations/scala. Taught by the creator of Scala, Martin Odersky, this online class takes a somewhat academic approach to teaching the fundamentals of functional programming. You will learn a lot about Scala by solving the programming assignments. Moreover, this specialization includes a course on Apache Spark. Furthermore, Kojo (http://www.kogics.net/sf:kojo) is an interactive learning environment that uses Scala programming to explore and play with math, art, music, animations, and games.

Your first line of code

As a first example, we will use the pretty common Hello, world! program in order to show you how to use Scala and its tools without knowing much about it. Let's open your favorite editor (this example runs on Windows 7, but can be run similarly on Ubuntu or macOS), say Notepad++, and type the following lines of code:

object HelloWorld {
  def main(args: Array[String]){ 
    println("Hello, world!")  
  } 
}

Now, save the code with a name, say HelloWorld.scala, as shown in the following figure:

Figure 11: Saving your first Scala source code using Notepad++

Let's compile the source file as follows:

C:\>scalac HelloWorld.scala
 C:\>scala HelloWorld
 Hello, world!
 C:\>

I'm the hello world program, explain me well!

The program should be familiar to anyone who has some programming of experience. It has a main method which prints the string Hello, world! to your console. Next, to see how we defined the main function, we used the def main() strange syntax to define it. def is a Scala keyword to declare/define a method, and we will be covering more about methods and different ways of writing them in the next chapter. So, we have an Array[String] as an argument for this method, which is an array of strings that can be used for initial configurations of your program, and omit is valid. Then, we use the common println() method, which takes a string (or formatted one) and prints it to the console. A simple hello world has opened up many topics to learn; three in particular:

● Methods (covered in a later chapter)
● Objects and classes (covered in a later chapter)
● Type inference - the reason why Scala is a statically typed language - explained earlier

Run Scala interactively!

The scala command starts the interactive shell for you, where you can interpret Scala expressions interactively:

> scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121).
Type in expressions for evaluation. Or try :help.
scala>
scala> object HelloWorld {
   |   def main(args: Array[String]){
   |     println("Hello, world!")
   |   }
   | }
defined object HelloWorld
scala> HelloWorld.main(Array())
Hello, world!
scala>

The shortcut :q stands for the internal shell command :quit, used to exit the interpreter.

Compile it!

The scalac command, which is similar to javac command, compiles one or more Scala source files and generates a bytecode as output, which then can be executed on any Java Virtual Machine. To compile your hello world object, use the following:

> scalac HelloWorld.scala

By default, scalac generates the class files into the current working directory. You may specify a different output directory using the -d option:

> scalac -d classes HelloWorld.scala

However, note that the directory called classes must be created before executing this command.

Execute it with Scala command

The scala command executes the bytecode that is generated by the interpreter:

$ scala HelloWorld

Scala allows us to specify command options, such as the -classpath (alias -cp) option:

$ scala -cp classes HelloWorld

Before using the scala command to execute your source file(s), you should have a main method that acts as an entry point for your application. Otherwise, you should have an Object that extends Trait Scala.App, then all the code inside this object will be executed by the command. The following is the same Hello, world! example, but using the App trait:

#!/usr/bin/env Scala 
object HelloWorld extends App {  
  println("Hello, world!") 
}
HelloWorld.main(args)

The preceding script can be run directly from the command shell:

./script.sh

Note: we assume here that the file script.sh has the execute permission:

$ sudo chmod +x script.sh

Then, the search path for the scala command is specified in the $PATH environment variable.

Filter reviews by

All

Amazon verified reviews

Verified Amazon Customer Jul 27, 2017

Fantastic book! I purchased this book yesterday and already read some chapters and had a quick look to other chapters too. It contains everything needed for learning big data analytics with Spark. It elaborately covers the Scala programming, Sparks basic operations, many machine learning algorithms (classification, regression, clustering, recommendation system), NLP, graph analytics, structured streaming, and of course some other advanced topics of Spark such as tuning, debugging and cluster deployment.More interestingly, it also discussed PySpark, SparkR, Alixuio, and Zeppelin so I don't need to switch to something else. So in summary, it contains everything I was looking for few months.

Amazon Verified review

Md Ashiqur Rahman Nov 16, 2017

This is a fantastic book for getting a real understanding about how to develop big data processing applications using Scala and Spark. This book is very up to date containing wide coverage of all the APIs such as Spark SQL, structured streaming, graphX, Spark MLib and more.

Amazon Customer Aug 01, 2017

I am a scala enthusiastic, also professional java programmer, now a days im into bigdata stack. I've read the entire book more or less. I would highly recommend this book if someone working in java area but want to write java code in functional (scala) way to solve bigdata problems - scala makes spark processing much easier. I would also recommend this book who is yet to start bigdata coding using scala for spark processing. This book includes a lot of nice, small and real life bigdata analytics problems and how to solve them easily using scala on top of hadoop ecosystem.Thanks to authors for such an useful book - its not just only concepts but also guide to implement concepts.

Ariel Herrera Oct 03, 2018

What I love about this book is that the author explains each concept very well and provides numerous examples. Yes, it is a large read however you can jump around in the book based on your need and it is sufficient.

Julio Bregeiro Nov 18, 2023

Aprovadi

Scala and Spark for Big Data Analytics: Explore the concepts of functional programming, data streaming, and machine learning

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access