Reader small image

You're reading from  Scala for Data Science

Product typeBook
Published inJan 2016
Reading LevelIntermediate
Publisher
ISBN-139781785281372
Edition1st Edition
Languages
Right arrow
Author (1)
Pascal Bugnion
Pascal Bugnion
author image
Pascal Bugnion

Pascal Bugnion is a data engineer at the ASI, a consultancy offering bespoke data science services. Previously, he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University. Besides Scala, Pascal is a keen Python developer. He has contributed to NumPy, matplotlib and IPython. He also maintains scikit-monaco, an open source library for Monte Carlo integration. He currently lives in London, UK.
Read more about Pascal Bugnion

Right arrow

Appendix A. Pattern Matching and Extractors

Pattern matching is a powerful tool for control flow in Scala. It is often underused and under-estimated by people coming to Scala from imperative languages.

Let's start with a few examples of pattern matching before diving into the theory. We start by defining a tuple:

scala> val names = ("Pascal", "Bugnion")
names: (String, String) = (Pascal,Bugnion)

We can use pattern matching to extract the elements of this tuple and bind them to variables:

scala> val (firstName, lastName) = names
firstName: String = Pascal
lastName: String = Bugnion

We just extracted the two elements of the names tuple, binding them to the variables firstName and lastName. Notice how the left-hand side defines a pattern that the right-hand side must match: we are declaring that the variable names must be a two-element tuple. To make the pattern more specific, we could also have specified the expected types of the elements in the tuple:

scala> val (firstName:String, lastName:String) = names
firstName: String = Pascal
lastName: String = Bugnion

What happens if the pattern on the left-hand side does not match the right-hand side?

scala> val (firstName, middleName, lastName) = names
<console>:13: error: constructor cannot be instantiated to expected type;
found   : (T1, T2, T3)
required: (String, String)
   val (firstName, middleName, lastName) = names

This results in a compile error. Other types of pattern matching failures result in runtime errors.

Pattern matching is very expressive. To achieve the same behavior without pattern matching, you would have to do the following explicitly:

  • Verify that the variable names is a two-element tuple

  • Extract the first element and bind it to firstName

  • Extract the second element and bind it to lastName

If we expect certain elements in the tuple to have specific values, we can verify this as part of the pattern match. For instance, we can verify that the first element of the names tuple matches "Pascal":

scala> val ("Pascal", lastName) = names
lastName: String = Bugnion

Besides tuples, we can also match on Scala collections:

scala> val point = Array(1, 2, 3)
point: Array[Int] = Array(1, 2, 3)

scala> val Array(x, y, z) = point
x: Int = 1
y: Int = 2
z: Int = 3

Notice the similarity between this pattern matching and array construction:

scala> val point = Array(x, y, z)
point: Array[Int] = Array(1, 2, 3)

Syntactically, Scala expresses pattern matching as the reverse process to instance construction. We can think of pattern matching as the deconstruction of an object, binding the object's constituent parts to variables.

When matching against collections, one is sometimes only interested in matching the first element, or the first few elements, and discarding the rest of the collection, whatever its length. The operator _* will match against any number of elements:

scala> val Array(x, _*) = point
x: Int = 1

By default, the part of the pattern matched by the _* operator is not bound to a variable. We can capture it as follows:

scala> val Array(x, xs @ _*) = point
x: Int = 1
xs: Seq[Int] = Vector(2, 3)

Besides tuples and collections, we can also match against case classes. Let's start by defining a case representing a name:

scala> case class Name(first: String, last: String)
defined class Name

scala> val name = Name("Martin", "Odersky")
name: Name = Name(Martin,Odersky)

We can match against instances of Name in much the same way we matched against tuples:

scala> val Name(firstName, lastName) = name
firstName: String = Martin
lastName: String = Odersky

All these patterns can also be used in match statements:

scala> def greet(name:Name) = name match {
  case Name("Martin", "Odersky") => "An honor to meet you"
  case Name(first, "Bugnion") => "Wow! A family member!"
  case Name(first, last) => s"Hello, $first"
}
greet: (name: Name)String

Pattern matching in for comprehensions


Pattern matching is useful in for comprehensions for extracting items from a collection that match a specific pattern. Let's build a collection of Name instances:

scala> val names = List(Name("Martin", "Odersky"), 
  Name("Derek", "Wyatt"))
names: List[Name] = List(Name(Martin,Odersky), Name(Derek,Wyatt))

We can use pattern matching to extract the internals of the class in a for-comprehension:

scala> for { Name(first, last) <- names } yield first
List[String] = List(Martin, Derek)

So far, nothing terribly ground-breaking. But what if we wanted to extract the surname of everyone whose first name is "Martin"?

scala> for { Name("Martin", last) <- names } yield last
List[String] = List(Odersky)

Writing Name("Martin", last) <- names extracts the elements of names that match the pattern. You might think that this is a contrived example, and it is, but the examples in Chapter 7, Web APIs demonstrate the usefulness and versatility of this language pattern, for instance, for extracting specific fields from JSON objects.

Pattern matching internals


If you define a case class, as we saw with Name, you get pattern matching against the constructor for free. You should be using case classes to represent your data as much as possible, thus reducing the need to implement your own pattern matching. It is nevertheless useful to understand how pattern matching works.

When you create a case class, Scala automatically builds a companion object:

scala> case class Name(first: String, last: String)
defined class Name

scala> Name.<tab>
apply   asInstanceOf   curried   isInstanceOf   toString   tupled   unapply

The method used (internally) for pattern matching is unapply. This method takes, as argument, an object and returns Option[T], where T is a tuple of the values of the case class.

scala> val name = Name("Martin", "Odersky")
name: Name = Name(Martin,Odersky)

scala> Name.unapply(name)
Option[(String, String)] = Some((Martin,Odersky))

The unapply method is an extractor. It plays the opposite role of the constructor: it takes an object and extracts the list of parameters needed to construct that object. When you write val Name(firstName, lastName), or when you use Name as a case in a match statement, Scala calls Name.unapply on what you are matching against. A value of Some[(String, String)] implies a pattern match, while a value of None implies that the pattern fails.

To write custom extractors, you just need an object with an unapply method. While unapply normally resides in the companion object of a class that you are deconstructing, this need not be the case. In fact, it does not need to correspond to an existing class at all. For instance, let's define a NonZeroDouble extractor that matches any non-zero double:

scala> object NonZeroDouble { 
  def unapply(d:Double):Option[Double] = {
    if (d == 0.0) { None } else { Some(d) }  
  }
}
defined object NonZeroDouble

scala> val NonZeroDouble(denominator) = 5.5
denominator: Double = 5.5

scala> val NonZeroDouble(denominator) = 0.0
scala.MatchError: 0.0 (of class java.lang.Double)
  ... 43 elided

We defined an extractor for NonZeroDouble, despite the absence of a corresponding NonZeroDouble class.

This NonZeroDouble extractor would be useful in a match object. For instance, let's define a safeDivision function that returns a default value when the denominator is zero:

scala> def safeDivision(numerator:Double, 
  denominator:Double, fallBack:Double) =
    denominator match {
      case NonZeroDouble(d) => numerator / d
      case _ => fallBack
    }
safeDivision: (numerator: Double, denominator: Double, fallBack: Double)Double

scala> safeDivision(5.0, 2.0, 100.0)
Double = 2.5

scala> safeDivision(5.0, 0.0, 100.0)
Double = 100.0

This is a trivial example because the NonZeroDouble.unapply method is so simple, but you can hopefully see the usefulness and expressiveness, if we were to define a more complex test. Defining custom extractors lets you define powerful control flow constructs to leverage match statements. More importantly, they enable the client using the extractors to think about control flow declaratively: the client can declare that they need a NonZeroDouble, rather than instructing the compiler to check whether the value is zero.

Extracting sequences


The previous section explains extraction from case classes, and how to write custom extractors, but it does not explain how extraction works on sequences:

scala> val Array(a, b) = Array(1, 2)
a: Int = 1
b: Int = 2

Rather than relying on an unapply method, sequences rely on an unapplySeq method defined in the companion object. This is expected to return an Option[Seq[A]]:

scala> Array.unapplySeq(Array(1, 2))
Option[IndexedSeq[Int]] = Some(Vector(1, 2))

Let's write an example. We will write an extractor for Breeze vectors (which do not currently support pattern matching). To avoid clashing with the DenseVector companion object, we will write our unapplySeq in a separate object, called DV. All our unapplySeq method needs to do is convert its argument to a Scala Vector instance. To avoid muddying the concepts with generics, we will write this implementation for [Double] vectors only:

scala> import breeze.linalg._
import breeze.linalg._

scala> object DV {
  // Just need to convert to a Scala vector.
  def unapplySeq(v:DenseVector[Double]) = Some(v.toScalaVector)
}
defined object DV

Let's try our new extractor implementation:

scala> val vec = DenseVector(1.0, 2.0, 3.0)
vec: breeze.linalg.DenseVector[Double] = DenseVector(1.0, 2.0, 3.0)

scala> val DV(x, y, z) = vec
x: Double = 1.0
y: Double = 2.0
z: Double = 3.0

Summary


Pattern matching is a powerful tool for control flow. It encourages the programmer to think declaratively: declare that you expect a variable to match a certain pattern, rather than explicitly tell the computer how to check that it matches this pattern. This can save many lines of code and enhance clarity.

Reference


For an overview of pattern matching in Scala, there is no better reference than Programming in Scala, by Martin Odersky, Bill Venners, and Lex Spoon. An online version of the first edition is available at: https://www.artima.com/pins1ed/case-classes-and-pattern-matching.html.

Daniel Westheide's blog covers slightly more advanced Scala constructs, and is a very useful read: http://danielwestheide.com/blog/2012/11/21/the-neophytes-guide-to-scala-part-1-extractors.html.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Scala for Data Science
Published in: Jan 2016Publisher: ISBN-13: 9781785281372
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pascal Bugnion

Pascal Bugnion is a data engineer at the ASI, a consultancy offering bespoke data science services. Previously, he was the head of data engineering at SCL Elections. He holds a PhD in computational physics from Cambridge University. Besides Scala, Pascal is a keen Python developer. He has contributed to NumPy, matplotlib and IPython. He also maintains scikit-monaco, an open source library for Monte Carlo integration. He currently lives in London, UK.
Read more about Pascal Bugnion