Reader small image

You're reading from  Data Engineering with Scala and Spark

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781804612583
Edition1st Edition
Right arrow
Authors (3):
Eric Tome
Eric Tome
author image
Eric Tome

Eric Tome has over 25 years of experience working with data. He has contributed to and led teams that ingested, cleansed, standardized, and prepared data used by business intelligence, data science, and operations teams. He has a background in mathematics and currently works as a senior solutions architect at Databricks, helping customers solve their data and AI challenges.
Read more about Eric Tome

Rupam Bhattacharjee
Rupam Bhattacharjee
author image
Rupam Bhattacharjee

Rupam Bhattacharjee works as a lead data engineer at IBM. He has architected and developed data pipelines, processing massive structured and unstructured data using Spark and Scala for on-premises Hadoop and K8s clusters on the public cloud. He has a degree in electrical engineering.
Read more about Rupam Bhattacharjee

David Radford
David Radford
author image
David Radford

David Radford has worked in big data for over 10 years, with a focus on cloud technologies. He led consulting teams for several years, completing a migration from legacy systems to modern data stacks. He holds a master's degree in computer science and works as a senior solutions architect at Databricks.
Read more about David Radford

View More author details
Right arrow

Understanding objects, classes, and traits

In this section, we are going to look at classes, traits, and objects. If you have used Java before, then some of the topics covered in this section will look familiar. However, there are several differences too. For example, Scala provides singleton objects, which automatically create a class and a single instance of that class in one go. Another example is Scala has case classes, which provide great support for pattern matching, allow you to create instances without the new keyword, and provide a default toString implementation that is quite handy when printing to the console.

We will first look at classes, followed by objects, and then wrap this section up with a quick tour of traits.

Classes

A class is a blueprint for objects, which are instances of that class. For example, we can create a Point class using the following code:

class Point(val x: Int, val y: Int) {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
  override def toString: String = s"($x, $y)"
}

Example 1.5

The Point class has four members—two immutable variables, x and y, as well as two methods, add and toString. We can create instances of the Point class as follows:

scala> val p1 = new Point(1,1)
p1: Point = (1, 1)
scala> val p2 = new Point(2,3)
p2: Point = (2, 3)

Example 1.6

We can then create a new instance, p3, by adding p1 and p2, as follows:

scala> val p3 = p1 add p2
p3: Point = (3, 4)

Example 1.7

Scala supports the infix notation, characterized by the placement of operators between operands, and automatically converts p1 add p2 to p1.add(p2). Another way to define the Point class is using a case class, as shown here:

case class Point(x: Int, y: Int) {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
}

Example 1.8

A case class automatically adds a factory method with the name of the class, which enables us to leave out the new keyword when creating an instance. A factory method is used to create instances of a class without requiring us to explicitly call the constructor method. Refer to the following example:

scala> val p1 = Point(1,1)
p1: Point = Point(1,1)
scala> val p2 = Point(2,3)
p2: Point = Point(2,3)

Example 1.9

The compiler also adds default implementations of various methods such as toString and hashCode, which the regular class definition lacks. So, we did not have to override the toString method, as was done earlier, and yet both p1 and p2 were printed neatly on the console (Example 1.9).

All arguments in the parameter list of a case class automatically get a val prefix, which makes them parametric fields. A parametric field is a shorthand that defines a parameter and a field with the same name.

To better understand the difference, let’s look at the following example:

scala> case class Point1(x: Int, y: Int) //x and y are parametric fields
defined class Point1
scala> class Point2(x: Int, y: Int) //x and y are regular parameters
defined class Point2
scala> val p1 = Point1(1, 2)
p1: Point1 = Point1(1,2)
scala> val p2 = new Point2(3, 4)
p2: Point2 = Point2@203ced18

Example 1.10

If we now try to access p1.x, it will work because x is a parametric field, whereas trying to access p2.x will result in an error. Example 1.11 illustrates this:

scala> println(p1.x)
1
scala> println(p2.x)
<console>:13: error: value x is not a member of Point2
       println(p2.x)
                  ^

Example 1.11

Trying to access p2.x will result in a compile error, value x is not a member of Point2. Case classes also have excellent support for pattern matching, as we will see in the Understanding pattern matching section.

Scala also provides an abstract class, which, unlike a regular class, can contain abstract methods. For example, we can define the following hierarchy:

abstract class Animal
abstract class Pet extends Animal {
  def name: String
}
class Dog(val name: String) extends Pet {
  override def toString = s"Dog($name)"
}
scala> val pluto = new Dog("Pluto")
pluto: Dog = Dog(Pluto)

Example 1.12

Animal is the base class. Pet extends Animal and declares an abstract method, name. Dog extends Pet and uses a parametric field, name (it is both a parameter as well as a field). Because Scala uses the same namespace for fields and methods, this allows the field name in the Dog class to provide a concrete implementation of the abstract method name in Pet.

Object

Unlike Java, Scala does not support static members in classes; instead, it has singleton objects. A singleton object is defined using the object keyword, as shown here:

class Point(val x: Int, val y: Int) {
  // new keyword is not required to create a Point object
  // apply method from companion object is invoked
  def add(that: Point): Point = Point(x + that.x, y + that.y)
  override def toString: String = s"($x, $y)"
}
object Point {
  def apply(x: Int, y: Int) = new Point(x, y)
}

Example 1.13

In this example, the Point singleton object shares the same name with the class and is called that class’s companion object. The class is called the companion class of the singleton object. For an object to qualify as a companion object of a given class, it needs to be in the same source file as the class itself.

Please note that the add method does not use the new keyword on the right-hand side. Point(x1, y1) is de-sugared into Point.apply(x1, y1), which returns a Point instance.

Singleton objects are also used to write an entrypoint for Scala applications. One option is to provide an explicit main method within the singleton object, as shown here:

object SampleScalaApplication {
  def main(args: Array[String]): Unit = {
    println(s"This is a sample Scala application")
  }
}

Example 1.14

The other option is to extend the App trait, which provides a main method implementation. We will cover traits in the next section. You can also refer to the Further reading section (the third point) for more information:

 object SampleScalaApplication extends App {
  println(s"This is a sample Scala application")
}

Example 1.15

Trait

Scala also has traits, which are used to define rich interfaces as well as stackable modifications. You can read more stackable modifications in the Further reading section (the fourth point) Unlike class inheritance, where each class inherits from just one super class, a class can mix in any number of traits. A trait can have abstract as well as concrete members. Here is a simplified example of the Ordered trait from the Scala standard library:

trait Ordered[T] {
  // compares receiver (this) with argument of the same type
  def compare(that: T): Int
  def <(that: T): Boolean = (this compare that) < 0
  def >(that: T): Boolean = (this compare that) > 0
  def <=(that: T): Boolean = (this compare that) <= 0
  def >=(that: T): Boolean = (this compare that) >= 0
}

Example 1.16

The Ordered trait takes a type parameter, T, and has an abstract method, compare. All of the other methods are defined in terms of that method. A class can add the functionalities defined by <, >, and so on, just by defining the compare method. The compare method should return a negative integer if the receiver is less than the argument, positive if the receiver is greater than the argument, and 0 if both objects are the same.

Going back to our Point example, we can define a rule to say that a point, p1, is greater than p2 if the distance of p1 from the origin is greater than that of p2:

case class Point(x: Int, y: Int) extends Ordered[Point] {
  def add(that: Point): Point = new Point(x + that.x, y + that.y)
  def compare(that: Point) = (x ^ 2 + y ^ 2) ^ 1 / 2 - (that.x ^ 2 + that.y ^ 2) ^ 1 / 2
}

Example 1.17

With the definition of compare now in place, we can perform a comparison between two arbitrary points, as follows:

scala> val p1 = Point(1,1)
p1: Point = Point(1,1)
scala> val p2 = Point(2,2)
p2: Point = Point(2,2)
scala> println(s"p1 is greater than p2: ${p1 > p2}")
p1 is greater than p2: false
example 1.18

In this section, we looked at objects, classes, and traits. In the next section, we are going to look at HOFs.

Previous PageNext Page
You have been reading a chapter from
Data Engineering with Scala and Spark
Published in: Jan 2024Publisher: PacktISBN-13: 9781804612583
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Eric Tome

Eric Tome has over 25 years of experience working with data. He has contributed to and led teams that ingested, cleansed, standardized, and prepared data used by business intelligence, data science, and operations teams. He has a background in mathematics and currently works as a senior solutions architect at Databricks, helping customers solve their data and AI challenges.
Read more about Eric Tome

author image
Rupam Bhattacharjee

Rupam Bhattacharjee works as a lead data engineer at IBM. He has architected and developed data pipelines, processing massive structured and unstructured data using Spark and Scala for on-premises Hadoop and K8s clusters on the public cloud. He has a degree in electrical engineering.
Read more about Rupam Bhattacharjee

author image
David Radford

David Radford has worked in big data for over 10 years, with a focus on cloud technologies. He led consulting teams for several years, completing a migration from legacy systems to modern data stacks. He holds a master's degree in computer science and works as a senior solutions architect at Databricks.
Read more about David Radford