Reader small image

You're reading from  Data Engineering with Scala and Spark

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781804612583
Edition1st Edition
Right arrow
Authors (3):
Eric Tome
Eric Tome
author image
Eric Tome

Eric Tome has over 25 years of experience working with data. He has contributed to and led teams that ingested, cleansed, standardized, and prepared data used by business intelligence, data science, and operations teams. He has a background in mathematics and currently works as a senior solutions architect at Databricks, helping customers solve their data and AI challenges.
Read more about Eric Tome

Rupam Bhattacharjee
Rupam Bhattacharjee
author image
Rupam Bhattacharjee

Rupam Bhattacharjee works as a lead data engineer at IBM. He has architected and developed data pipelines, processing massive structured and unstructured data using Spark and Scala for on-premises Hadoop and K8s clusters on the public cloud. He has a degree in electrical engineering.
Read more about Rupam Bhattacharjee

David Radford
David Radford
author image
David Radford

David Radford has worked in big data for over 10 years, with a focus on cloud technologies. He led consulting teams for several years, completing a migration from legacy systems to modern data stacks. He holds a master's degree in computer science and works as a senior solutions architect at Databricks.
Read more about David Radford

View More author details
Right arrow

Understanding pattern matching

Scala has excellent support for pattern matching. The most prominent use is the match expression, which takes the following form:

selector match { alternatives }

selector is the expression that the alternatives will be tried against. Each alternative starts with the case keyword and includes a pattern, an arrow symbol =>, and one or more expressions, which will be evaluated if the pattern matches. The patterns can be of various types, such as the following:

  • Wildcard patterns
  • Constant patterns
  • Variable patterns
  • Constructor patterns
  • Sequence patterns
  • Tuple patterns
  • Typed patterns

Before going through each of these pattern types, let’s define our own custom List:

trait List[+A]
case class Cons[+A](head: A, tail: List[A]) extends List[A]
case object Nil extends List[Nothing]
object List {
  def apply[A](as: A*): List[A] = if (as.isEmpty) Nil else Cons(as.head, apply(as.tail: _*))
}

Example 1.56

Wildcard patterns

The wildcard pattern (_) matches any object and is used as a default, catch-all alternative. Consider the following example:

scala> def emptyList[A](l: List[A]): Boolean = l match {
     |   case Nil => true
     |   case _   => false
     | }
emptyList: [A](l: List[A])Boolean
scala> emptyList(List(1, 2))
res8: Boolean = false

Example 1.57

A wildcard can also be used to ignore parts of an object that we do not care about. Refer to the following code:

scala> def threeElements[A](l: List[A]): Boolean = l match {
     |   case Cons(_, Cons(_, Cons(_, Nil))) => true
     |   case _                            => false
     | }
threeElements: [A](l: List[A])Boolean
scala> threeElements(List(true, false))
res11: Boolean = false
scala> threeElements(Nil)
res12: Boolean = false
scala> threeElements(List(1, 2, 3))
res13: Boolean = true
scala> threeElements(List("a", "b", "c", "d"))
res14: Boolean = false

Example 1.58

In the preceding example, the threeElements method checks whether a given list has exactly three elements. The values themselves are not needed and are thus discarded in the pattern match.

Constant patterns

A constant pattern matches only itself. Any literal can be used as a constant – 1, true, and hi are all constant patterns. Any val or singleton object can also be used as a constant. The emptyList method from the previous example uses Nil to check whether the list is empty.

Variable patterns

Like a wildcard, a variable pattern matches any object and is bound to it. We can then use this variable to refer to the object:

scala> val ints = List(1, 2, 3, 4)
ints: List[Int] = Cons(1,Cons(2,Cons(3,Cons(4,Nil))))
scala> ints match {
     |   case Cons(_, Cons(_, Cons(_, Nil))) => println("A three element list")
     |   case l => println(s"$l is not a three element list")
     | }
Cons(1,Cons(2,Cons(3,Cons(4,Nil)))) is not a three element list

Example 1.59

In the preceding example, l is bound to the entire list, which then is printed to the console.

Constructor patterns

A constructor pattern looks like Cons(_, Cons(_, Cons(_, Nil))). It consists of the name of a case class (Cons), followed by a number of patterns in parentheses. These extra patterns can themselves be constructor patterns, and we can use them to check arbitrarily deep into an object. In this case, checks are performed at four levels.

Sequence patterns

Scala allows us to match against sequence types such as Seq, List, and Array among others. It looks similar to a constructor pattern. Refer to the following:

scala> def thirdElement[A](s: Seq[A]): Option[A] = s match {
     |   case Seq(_, _, a, _*) => Some(a)
     |   case _            => None
     | }
thirdElement: [A](s: Seq[A])Option[A]
scala> val intSeq = Seq(1, 2, 3, 4)
intSeq: Seq[Int] = List(1, 2, 3, 4)
scala> thirdElement(intSeq)
res16: Option[Int] = Some(3)
scala> thirdElement(Seq.empty[String])
res17: Option[String] = None

Example 1.60

As the example illustrates, thirdElement returns a value of type Option[A]. If a sequence has three or more elements, it will return the third element, whereas for any sequence with less than three elements, it will return None. Seq(_, _, a, _*) binds a to the third element if present. The _* pattern matches any number of elements.

Tuple patterns

We can pattern match against tuples too:

scala> val tuple3 = (1, 2, 3)
tuple3: (Int, Int, Int) = (1,2,3)
scala> def printTuple(a: Any): Unit = a match {
     |   case (a, b, c) => println(s"Tuple has $a, $b, $c")
     |   case _     =>
     | }
printTuple: (a: Any)Unit
scala> printTuple(tuple3)
Tuple has 1, 2, 3

Example 1.61

Running the preceding program will print Tuple has 1, 2, 3 to the console.

Typed patterns

A typed pattern allows us to check types in the pattern match and can be used for type tests and type casts:

scala> def getLength(a: Any): Int =
     |   a match {
     |     case s: String    => s.length
     |     case l: List[_]   => l.length //this is List from Scala collection library
     |     case m: Map[_, _] => m.size
     |     case _            => -1
     |   }
getLength: (a: Any)Int
scala> getLength("hello, world")
res3: Int = 12
scala> getLength(List(1, 2, 3, 4))
res4: Int = 4
scala> getLength(Map.empty[Int, String])
res5: Int = 0

Example 1.62

Please note that the argument a of type Any does not support methods such as length or size in the result expression. Scala automatically applies a type test and a type cast to match the target type. For example, case s: String => s.length is equivalent to the following snippet:

if (s.isInstanceOf[String]) {
  val x = s.asInstanceOf[String]
  x.length
}

Example 1.63

One important thing to note, though, is that Scala does not maintain type arguments during runtime. So, there is no way to check whether list has all integer elements or not. For example, the following will print A list of String to the console. The compiler will emit a warning to alert about the runtime behavior. Arrays are the only exception because the element type is stored with the array value:

scala> List.fill(5)(0) match {
     |   case _: List[String] => println("A list of String")
     |   case _           =>
     | }
<console>:13: warning: fruitless type test: a value of type List[Int] cannot also be a List[String] (the underlying of List[String]) (but still might match its erasure)
         case _: List[String] => println("A list of String")
                 ^
A list of String

Example 1.64

Previous PageNext Page
You have been reading a chapter from
Data Engineering with Scala and Spark
Published in: Jan 2024Publisher: PacktISBN-13: 9781804612583
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Eric Tome

Eric Tome has over 25 years of experience working with data. He has contributed to and led teams that ingested, cleansed, standardized, and prepared data used by business intelligence, data science, and operations teams. He has a background in mathematics and currently works as a senior solutions architect at Databricks, helping customers solve their data and AI challenges.
Read more about Eric Tome

author image
Rupam Bhattacharjee

Rupam Bhattacharjee works as a lead data engineer at IBM. He has architected and developed data pipelines, processing massive structured and unstructured data using Spark and Scala for on-premises Hadoop and K8s clusters on the public cloud. He has a degree in electrical engineering.
Read more about Rupam Bhattacharjee

author image
David Radford

David Radford has worked in big data for over 10 years, with a focus on cloud technologies. He led consulting teams for several years, completing a migration from legacy systems to modern data stacks. He holds a master's degree in computer science and works as a senior solutions architect at Databricks.
Read more about David Radford