Reader small image

You're reading from  Scala and Spark for Big Data Analytics

Product typeBook
Published inJul 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785280849
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Md. Rezaul Karim
Md. Rezaul Karim
author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

Sridhar Alla
Sridhar Alla
author image
Sridhar Alla

Sridhar?Alla?is the co-founder and CTO of Blue Whale Consulting and is expert at helping companies (big and small) define their vision for systems and capabilities that will allow them to establish a strategic execution plan to deal with the ever-growing data collected to support analytics and product teams. He has very experienced at dealing with all aspects of data collection, security, governance, and processing as part of end-to-end big data analytics and machine learning initiatives (including predictive modeling, deep learning, and ML automation). Sridhar?is a published book author and an avid presenter at numerous conferences, including Strata, Hadoop World, and Spark Summit.? He also has several patents filed with the US PTO on large-scale computing and distributed systems.? He has over 18 years' experience writing code in Scala, Java, C, C++, Python, R, and Go, and has extensive hands-on knowledge of Spark, Flink, TensorFlow, Keras, Hadoop, Cassandra, HBase, MongoDB, Riak, Redis, Zeppelin, Mesos, Docker, Kafka, ElasticSearch, Solr, H2O, machine learning, text analytics, distributed computing, and high-performance computing. Sridhar lives with his wife and daughter in New Jersey and in his spare time loves blogging and coaching organizations on next-generation advancements in technology and their alignment with business goals.
Read more about Sridhar Alla

View More author details
Right arrow

Object-Oriented Scala

"The object-oriented model makes it easy to build up programs by accretion. What this often means, in practice, is that it provides a structured way to write spaghetti code."

- Paul Graham

In the previous chapter, we looked at how to get programming started with Scala. Well, if you're writing the procedural program that we followed in the previous chapter, you can enforce the code reusability by creating procedures or functions. However, if you continue working, consequently, your program gets longer, bigger, and more complex. At a certain point, you will not even have any other more simple way to organize the entire code before production.

On the contrary, the object-oriented programming (OOP) paradigm provides a whole new layer of abstraction. You can then modularize your code through defining OOP entities such as classes with related properties...

Variables in Scala

Before entering into the depth of OOP features, first, we need to know details about the different types of variables and data types in Scala. To declare a variable in Scala, you need to use var or val keywords. The formal syntax of declaring a variable in Scala is as follows:

val or var VariableName : DataType = Initial_Value

For example, let's see how can we declare two variables whose data types are explicitly specified as follows:

var myVar : Int = 50
val myVal : String = "Hello World! I've started learning Scala."

You can even just declare a variable without specifying the DataType. For example, let's see how to declare a variable using val or var, as follows:

var myVar = 50
val myVal = "Hello World! I've started learning Scala."

There are two types of variables in Scala: mutable and immutable that can be defined as...

Methods, classes, and objects in Scala

In the previous section, we saw how to work with Scala variables, different data types and their mutability and immutability, along with their usages scopes. However, in this section, to get the real flavor of the OOP concept, we are going to deal with methods, objects, and classes. These three features of Scala will help us understand the object-oriented nature of Scala and its features.

Methods in Scala

In this part, we are going to talk about methods in Scala. As you dive into Scala, you'll find that there are lots of ways to define methods in Scala. We will demonstrate them in some of these ways:

def min(x1:Int, x2:Int) : Int = {
if (x1 < x2) x1 else x2
}

The preceding declaration...

Packages and package objects

Just like Java, a package is a special container or object which contains/defines a set of objects, classes, and even packages. Every Scala file has the following automatically imported:

  • java.lang._
  • scala._
  • scala.Predef._

The following is an example for basic imports:

// import only one member of a package
import java.io.File
// Import all members in a specific package
import java.io._
// Import many members in a single import statement
import java.io.{File, IOException, FileNotFoundException}
// Import many members in a multiple import statement
import java.io.File
import java.io.FileNotFoundException
import java.io.IOException

You can even rename a member while importing, and that's to avoid a collision between packages that have the same member name. This method is also called class alias:

import java.util.{List => UtilList}
import java.awt.{List...

Java interoperability

Java is one of the most popular languages, and many programmers learn Java programming as their first entrance to the programming world. The popularity of Java has increased since its initial release back in 1995. Java has gained in popularity for many reasons. One of them is the design of its platform, such that any Java code will be compiled to bytecode, which in turn runs on the JVM. With this magnificent feature, Java language to be being written once and run anywhere. So, Java is a cross-platform language.

Also, Java has lots of support from its community and lots of packages that will help you get your idea up and running with the help of these packages. Then comes Scala, which has lots of features that Java lacks, such as type inference and optional semicolon, immutable collections built right into Scala core, and lots more features (addressed in Chapter...

Pattern matching

One of the widely used features of Scala is pattern matching. Each pattern match has a set of alternatives, each of them starting with the case keyword. Each alternative has a pattern and expression(s), which will be evaluated if the pattern matches and the arrow symbol => separates pattern(s) from expression(s). The following is an example which demonstrates how to match against an integer:

object PatternMatchingDemo1 {
def main(args: Array[String]) {
println(matchInteger(3))
}
def matchInteger(x: Int): String = x match {
case 1 => "one"
case 2 => "two"
case _ => "greater than two"
}
}

You can run the preceding program by saving this file in PatternMatchingDemo1.scala and then using the following commands to run it. Just use the following command:

>scalac Test.scala
>scala Test

You will get the...

Implicit in Scala

Implicit is another exciting and powerful feature introduced by Scala, and it can refer to two different things:

  • A value that can be automatically passed
  • Automatic conversion from one type to another
  • They can be used for extending the capabilities of a class

Actual automatic conversion can be accomplished with implicit def, as seen in the following example (supposing you are using the Scala REPL):

scala> implicit def stringToInt(s: String) = s.toInt
stringToInt: (s: String)Int

Now, having the preceding code in my scope, it's possible for me to do something like this:

scala> def add(x:Int, y:Int) = x + y
add: (x: Int, y: Int)Int

scala> add(1, "2")
res5: Int = 3
scala>

Even if one of the parameters passed to add() is a String (and add() would require you to provide two integers), having the implicit conversion in scope allows the compiler...

Generic in Scala

Generic classes are classes which take a type as a parameter. They are particularly useful for collection classes. Generic classes can be used in everyday data structure implementation, such as stack, queue, linked list, and so on. We will see some examples.

Defining a generic class

Generic classes take a type as a parameter within square brackets []. One convention is to use the letter A as a type parameter identifier, though any parameter name may be used. Let's see a minimal example on Scala REPL, as follows:

scala> class Stack[A] {
| private var elements: List[A] = Nil
| def push(x: A) { elements = x :: elements }
| def peek: A = elements.head
| def pop(...

SBT and other build systems

It's necessary to use a build tool for any enterprise software project. There are lots of build tools that you can choose from, such as Maven, Gradle, Ant, and SBT. A good choice of build tool is the one which will let you focus on coding rather than compilation complexities.

Build with SBT

Here, we are going to give a brief introduction to SBT. Before going any further, you need to install SBT using the installation method that fits your system from their official installations methods (URL: http://www.scala-sbt.org/release/docs/Setup.html).

So, let's begin with SBT to demonstrate the use of SBT in a terminal. For this build tool tutorial, we assume that your source code files are in...

Summary

Structuring code in a sane way, with classes and traits, enhances the reusability of your code with generics, and creates a project with standard and widespread tools. Improve on the basics to know how Scala implements the OO paradigm to allow the building of modular software systems. In this chapter, we discussed the basic object-oriented features in Scala, such as classes and objects, packages and package objects, traits, and trait linearization, Java interoperability, pattern matching, implicit, and generics. Finally, we discussed SBT and other build systems that will be needed to build our Spark application on Eclipse or any other IDEs.

In the next chapter, we will discuss what functional programming is and how Scala supports it. We will get to know why it matters and what the advantages of using functional concepts are. Continuing, you will learn pure functions, higher...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Scala and Spark for Big Data Analytics
Published in: Jul 2017Publisher: PacktISBN-13: 9781785280849
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Md. Rezaul Karim

Md. Rezaul Karim is a researcher, author, and data science enthusiast with a strong computer science background, coupled with 10 years of research and development experience in machine learning, deep learning, and data mining algorithms to solve emerging bioinformatics research problems by making them explainable. He is passionate about applied machine learning, knowledge graphs, and explainable artificial intelligence (XAI). Currently, he is working as a research scientist at Fraunhofer FIT, Germany. He is also a PhD candidate at RWTH Aachen University, Germany. Before joining FIT, he worked as a researcher at the Insight Centre for Data Analytics, Ireland. Previously, he worked as a lead software engineer at Samsung Electronics, Korea.
Read more about Md. Rezaul Karim

author image
Sridhar Alla

Sridhar?Alla?is the co-founder and CTO of Blue Whale Consulting and is expert at helping companies (big and small) define their vision for systems and capabilities that will allow them to establish a strategic execution plan to deal with the ever-growing data collected to support analytics and product teams. He has very experienced at dealing with all aspects of data collection, security, governance, and processing as part of end-to-end big data analytics and machine learning initiatives (including predictive modeling, deep learning, and ML automation). Sridhar?is a published book author and an avid presenter at numerous conferences, including Strata, Hadoop World, and Spark Summit.? He also has several patents filed with the US PTO on large-scale computing and distributed systems.? He has over 18 years' experience writing code in Scala, Java, C, C++, Python, R, and Go, and has extensive hands-on knowledge of Spark, Flink, TensorFlow, Keras, Hadoop, Cassandra, HBase, MongoDB, Riak, Redis, Zeppelin, Mesos, Docker, Kafka, ElasticSearch, Solr, H2O, machine learning, text analytics, distributed computing, and high-performance computing. Sridhar lives with his wife and daughter in New Jersey and in his spare time loves blogging and coaching organizations on next-generation advancements in technology and their alignment with business goals.
Read more about Sridhar Alla