Exploring Scala Performance

In this article by Michael Diamant and Vincent Theron, author of the book Scala High Performance Programming, we look at how Scala features get compiled with bytecode.

(For more resources related to this topic, see here.)

Value classes

The domain model of the order book application included two classes, Price and OrderId. We pointed out that we created domain classes for Price and OrderId to provide contextual meanings to the wrapped BigDecimal and Long. While providing us with readable code and compilation time safety, this practice also increases the amount of instances that are created by our application. Allocating memory and generating class instances create more work for the garbage collector by increasing the frequency of collections and by potentially introducing additional long-lived objects. The garbage collector will have to work harder to collect them, and this process may severely impact our latency.

Luckily, as of Scala 2.10, the AnyVal abstract class is available for developers to define their own value classes to solve this problem. The AnyVal class is defined in the Scala doc (http://www.scala-lang.org/api/current/#scala.AnyVal) as, "the root class of all value types, which describe values not implemented as objects in the underlying host system." The AnyVal class can be used to define a value class, which receives special treatment from the compiler. Value classes are optimized at compile time to avoid the allocation of an instance, and instead, they use the wrapped type.

Bytecode representation

As an example, to improve the performance of our order book, we can define Price and OrderId as value classes:

case class Price(value: BigDecimal) extends AnyVal 
case class OrderId(value: Long) extends AnyVal 

To illustrate the special treatment of value classes, we define a dummy function taking a Price value class and an OrderId value class as arguments:

def printInfo(p: Price, oId: OrderId): Unit = 
 println(s"Price: ${p.value}, ID: ${oId.value}") 

From this definition, the compiler produces the following method signature:

public void printInfo(scala.math.BigDecimal, long);

We see that the generated signature takes a BigDecimal object and a long object, even though the Scala code allows us to take advantage of the types defined in our model. This means that we cannot use an instance of BigDecimal or Long when calling printInfo because the compiler will throw an error.

An interesting thing to notice is that the second parameter of printInfo is not compiled as Long (an object), but long (a primitive type, note the lower case 'l'). Long and other objects matching to primitive types, such as Int,Float or Short, are specially handled by the compiler to be represented by their primitive type at runtime.

Value classes can also define methods. Let's enrich our Price class, as follows:

case class Price(value: BigDecimal) extends AnyVal { 
 def lowerThan(p: Price): Boolean = this.value < p.value 
} 
 
// Example usage 
val p1 = Price(BigDecimal(1.23)) 
val p2 = Price(BigDecimal(2.03)) 
p1.lowerThan(p2) // returns true

Our new method allows us to compare two instances of Price. At compile time, a companion object is created for Price. This companion object defines a lowerThan method that takes two BigDecimal objects as parameters. In reality, when we call lowerThan on an instance of Price, the code is transformed by the compiler from an instance method call to a static method call that is defined in the companion object:

public final boolean lowerThan$extension(scala.math.BigDecimal, scala.math.BigDecimal); 
  Code: 
    0: aload_1 
    1: aload_2 
    2: invokevirtual #56 // Method scala/math/BigDecimal.$less:(Lscala/math/BigDecimal;)Z 
    5: ireturn

If we were to write the pseudo-code equivalent to the preceding Scala code, it would look something like the following:

val p1 = BigDecimal(1.23) 
val p2 = BigDecimal(2.03) 
Price.lowerThan(p1, p2) // returns true

 

Performance considerations

Value classes are a great addition to our developer toolbox. They help us reduce the count of instances and spare some work for the garbage collector, while allowing us to rely on meaningful types that reflect our business abstractions. However, extending AnyVal comes with a certain set of conditions that the class must fulfill. For example, a value class may only have one primary constructor that takes one public val as a single parameter. Furthermore, this parameter cannot be a value class. We saw that value classes can define methods via def. Neither val nor var are allowed inside a value class. A nested class or object definitions are also impossible. Another limitation prevents value classes from extending anything other than a universal trait, that is, a trait that extends Any, only has defs as members, and performs no initialization. If any of these conditions is not fulfilled, the compiler generates an error. In addition to the preceding constraints that are listed, there are special cases in which a value class has to be instantiated by the JVM. Such cases include performing a pattern matching or runtime type test, or assigning a value class to an array. An example of the latter looks like the following snippet:

def newPriceArray(count: Int): Array[Price] = { 
 val a = new Array[Price](count) 
 for(i <- 0 until count){ 
  a(i) = Price(BigDecimal(Random.nextInt())) 
 } 
 a 
}

The generated bytecode is as follows:

public highperfscala.anyval.ValueClasses$$anonfun$newPriceArray$1(highperfscala.anyval.ValueClasses$Price[]); 
  Code: 
    0: aload_0 
    1: aload_1 
    2: putfield   #29 // Field a$1:[Lhighperfscala/anyval/ValueClasses$Price; 
    5: aload_0 
    6: invokespecial #80 // Method scala/runtime/AbstractFunction1$mcVI$sp."<init>":()V 
    9: return 
 
public void apply$mcVI$sp(int); 
  Code: 
    0: aload_0 
    1: getfield   #29 // Field a$1:[Lhighperfscala/anyval/ValueClasses$Price; 
    4: iload_1 
    5: new      #31 // class highperfscala/anyval/ValueClasses$Price 
    // omitted for brevity 
   21: invokevirtual #55 // Method scala/math/BigDecimal$.apply:(I)Lscala/math/BigDecimal; 
   24: invokespecial #59 // Method highperfscala/anyval/ValueClasses$Price."<init>":(Lscala/math/BigDecimal;)V 
   27: aastore 
   28: return

Notice how mcVI$sp is invoked from newPriceArray, and this creates a new instance of ValueClasses$Price at the 5 instruction.

As turning a single field case class into a value class is as trivial as extending the AnyVal trait, we recommend that you always use AnyVal wherever possible. The overhead is quite low, and it generate high benefits in terms of garbage collection's performance. To learn more about value classes, their limitations and use cases, you can find detailed descriptions at http://docs.scala-lang.org/overviews/core/value-classes.html.

Tagged types – an alternative to value classes

Value classes are an easy to use tool, and they can yield great improvements in terms of performance. However, they come with a constraining set of conditions, which can make them impossible to use in certain cases. We will conclude this section with a glance at an interesting alternative to leveraging the tagged type feature that is implemented by the Scalaz library.

The Scalaz implementation of tagged types is inspired by another Scala library, named shapeless. The shapeless library provides tools to write type-safe, generic code with minimal boilerplate. While we will not explore shapeless, we encourage you to learn more about the project at https://github.com/milessabin/shapeless.

Tagged types are another way to enforce compile-type checking without incurring the cost of instance instantiation. They rely on the Tagged structural type and the @@ type alias that is defined in the Scalaz library, as follows:

type Tagged[U] = { type Tag = U } 
type @@[T, U] = T with Tagged[U] 
Let's rewrite part of our code to leverage tagged types with our Price object:
object TaggedTypes { 
 
 sealed trait PriceTag 
 type Price = BigDecimal @@ PriceTag 
 
 object Price { 
  def newPrice(p: BigDecimal): Price = 
   Tag[BigDecimal, PriceTag](p) 
 
  def lowerThan(a: Price, b: Price): Boolean = 
   Tag.unwrap(a) < Tag.unwrap(b) 
 } 
}

Let's perform a short walkthrough of the code snippet. We will define a PriceTag sealed trait that we will use to tag our instances, a Price type alias is created and defined as a BigDecimal object tagged with PriceTag. The Price object defines useful functions, including the newPrice factory function that is used to tag a given BigDecimal object and return a Price object (that is, a tagged BigDecimal object). We will also implement an equivalent to the lowerThan method. This function takes two Price objects (that is two tagged BigDecimal objects), extracts the content of the tag that are two BigDecimal objects, and compares them.

Using our new Price type, we rewrite the same newPriceArray function that we previously looked at (the code is omitted for brevity, but you can refer to it in the attached source code), and print the following generated bytecode:

public void apply$mcVI$sp(int); 
  Code: 
    0: aload_0 
    1: getfield   #29 // Field a$1:[Ljava/lang/Object; 
    4: iload_1 
    5: getstatic   #35 // Field highperfscala/anyval/TaggedTypes$Price$.MODULE$:Lhighperfscala/anyval/TaggedTypes$Price$; 
    8: getstatic   #40 // Field scala/package$.MODULE$:Lscala/package$; 
   11: invokevirtual #44 // Method scala/package$.BigDecimal:()Lscala/math/BigDecimal$; 
   14: getstatic   #49 // Field scala/util/Random$.MODULE$:Lscala/util/Random$; 
   17: invokevirtual #53 // Method scala/util/Random$.nextInt:()I 
   20: invokevirtual #58 // Method scala/math/BigDecimal$.apply:(I)Lscala/math/BigDecimal; 
   23: invokevirtual #62 // Method highperfscala/anyval/TaggedTypes$Price$.newPrice:(Lscala/math/BigDecimal;)Ljava/lang/Object; 
   26: aastore 
   27: return

In this version, we no longer see an instantiation of Price, even though we are assigning them to an array. The tagged Price implementation involves a runtime cast, but we anticipate that the cost of this cast will be less than the instance allocations (and garbage collection) that was observed in the previous value class Price strategy.

Specialization

To understand the significance of specialization, it is important to first grasp the concept of object boxing. The JVM defines primitive types (boolean, byte, char, float, int, long, short, and double) that are stack allocated rather than heap allocated. When a generic type is introduced, for example, scala.collection.immutable.List, the JVM references an object equivalent, instead of a primitive type. In this example, an instantiated list of integers would be heap allocated objects rather than integer primitives. The process of converting a primitive to its object equivalent is called boxing, and the reverse process is called unboxing. Boxing is a relevant concern for performance-sensitive programming because boxing involves heap allocation. In performance-sensitive code that performs numerical computations, the cost of boxing and unboxing can create an order of magnitude or larger performance slowdowns. Consider the following example to illustrate boxing overhead:

List.fill(10000)(2).map(_* 2)

Creating the list via fill yields 10,000 heap allocations of the integer object. Performing the multiplication in map requires 10,000 unboxings to perform multiplication and then 10,000 boxings to add the multiplication result into the new list. From this simple example, you can imagine how critical section arithmetic will be slowed down due to boxing or unboxing operations.

As shown in Oracle's tutorial on boxing at https://docs.oracle.com/javase/tutorial/java/data/autoboxing.html, boxing in Java and also in Scala happens transparently. This means that without careful profiling or bytecode analysis, it is difficult to discern where you are paying the cost for object boxing. To ameliorate this problem, Scala provides a feature named specialization. Specialization refers to the compile-time process of generating duplicate versions of a generic trait or class that refer directly to a primitive type instead of the associated object wrapper. At runtime, the compiler-generated version of the generic class, or as it is commonly referred to, the specialized version of the class, is instantiated. This process eliminates the runtime cost of boxing primitives, which means that you can define generic abstractions while retaining the performance of a handwritten, specialized implementation.

Bytecode representation

Let's look at a concrete example to better understand how the specialization process works. Consider a naive, generic representation of the number of shares purchased, as follows:

case class ShareCount[T](value: T)

For this example, let's assume that the intended usage is to swap between an integer or long representation of ShareCount. With this definition, instantiating a long-based ShareCount instance incurs the cost of boxing, as follows:

def newShareCount(l: Long): ShareCount[Long] = ShareCount(l)

This definition translates to the following bytecode:

public highperfscala.specialization.Specialization$ShareCount<java.lang.Object> newShareCount(long); 
    Code: 
       0: new           #21  // class orderbook/Specialization$ShareCount 
       3: dup 
       4: lload_1 
       5: invokestatic  #27  // Method scala/runtime/BoxesRunTime.boxToLong:(J)Ljava/lang/Long; 
       8: invokespecial #30  // Method orderbook/Specialization$ShareCount."<init>":(Ljava/lang/Object;)V 
      11: areturn

In the preceding bytecode, it is clear in the 5 instruction that the primitive long value is boxed before instantiating the ShareCount instance. By introducing the @specialized annotation, we are able to eliminate the boxing by having the compiler provide an implementation of ShareCount that works with primitive long values. It is possible to specify which types you wish to specialize by supplying a set of types. As defined in the Specializables trait (http://www.scala-lang.org/api/current/index.html#scala.Specializable), you are able to specialize for all JVM primitives, such as Unit and AnyRef. For our example, let's specialize ShareCount for integers and longs, as follows:

case class ShareCount[@specialized(Long, Int) T](value: T)

With this definition, the bytecode now becomes the following:

  public highperfscala.specialization.Specialization$ShareCount<java.lang.Object> newShareCount(long); 
    Code: 
       0: new           #21  // class highperfscala.specialization/Specialization$ShareCount$mcJ$sp 
       3: dup 
       4: lload_1 
       5: invokespecial #24  // Method highperfscala.specialization/Specialization$ShareCount$mcJ$sp."<init>":(J)V 
       8: areturn

The boxing disappears and is curiously replaced with a different class name, ShareCount $mcJ$sp. This is because we are invoking the compiler-generated version of ShareCount that is specialized for long values. By inspecting the output of javap, we see that the specialized class generated by the compiler is a subclass of ShareCount:

public class highperfscala.specialization.Specialization$ShareCount$mcI$sp extends highperfscala.specialization.Specialization$ShareCount<java .lang.Object>

Bear this specialization implementation detail in mind as we turn to the Performance considerations section. The use of inheritance forces tradeoffs to be made in more complex use cases.

Performance considerations

At first glance, specialization appears to be a simple panacea for JVM boxing. However, there are several caveats to consider when using specialization. A liberal use of specialization leads to significant increases in compile time and resulting code size. Consider specializing Function3, which accepts three arguments as input and produces one result. To specialize four arguments across all types (that is, Byte, Short, Int, Long, Char, Float, Double, Boolean, Unit, and AnyRef) yields 10^4 or 10,000 possible permutations. For this reason, the standard library conserves application of specialization. In your own use cases, consider carefully which types you wish to specialize. If we specialize Function3 only for Int and Long, the number of generated classes shrinks to 2^4 or 16. Specialization involving inheritance requires extra attention because it is trivial to lose specialization when extending a generic class. Consider the following example:

class ParentFoo[@specialized T](t: T) 
  class ChildFoo[T](t: T) extends ParentFoo[T](t) 
 
  def newChildFoo(i: Int): ChildFoo[Int] = new ChildFoo[Int](i)

In this scenario, you likely expect that ChildFoo is defined with a primitive integer. However, as ChildFoo does not mark its type with the @specialized annotation, zero specialized classes are created. Here is the bytecode to prove it:

public highperfscala.specialization.Inheritance$ChildFoo<java.lang.Object> newChildFoo(int); 
    Code: 
       0: new           #16  // class highperfscala/specialization/Inheritance$ChildFoo 
       3: dup 
       4: iload_1 
       5: invokestatic  #22  // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 
       8: invokespecial #25  // Method highperfscala/specialization/Inheritance$ChildFoo."<init>":(Ljava/lang/Object;)V 
      11: areturn

The next logical step is to add the @specialized annotation to the definition of ChildFoo. In doing so, we stumble across a scenario where the compiler warns about use of specialization, as follows:

class ParentFoo must be a trait. Specialized version of class ChildFoo will inherit generic highperfscala.specialization.Inheritance.ParentFoo[Boolean] 
class ChildFoo[@specialized T](t: T) extends ParentFoo[T](t)

The compiler indicates that you have created a diamond inheritance problem, where the specialized versions of ChildFoo extend both ChildFoo and the associated specialized version of ParentFoo. This issue can be resolved by modeling the problem with a trait, as follows:

trait ParentBar[@specialized T] { 
    def t(): T 
  } 
 
  class ChildBar[@specialized T](val t: T) extends ParentBar[T] 
 
  def newChildBar(i: Int): ChildBar[Int] = new ChildBar(i)

This definition compiles using a specialized version of ChildBar, as we originally were hoping for, as see in the following code:

public highperfscala.specialization.Inheritance$ChildBar<java.lang.Object> newChildBar(int); 
    Code: 
       0: new           #32  // class highperfscala/specialization/Inheritance$ChildBar$mcI$sp 
       3: dup 
       4: iload_1 
       5: invokespecial #35  // Method highperfscala/specialization/Inheritance$ChildBar$mcI$sp."<init>":(I)V 
       8: areturn

An analogous and equally error-prone scenario is when a generic function is defined around a specialized type. Consider the following definition:

class Foo[T](t: T) 
 
  object Foo { 
    def create[T](t: T): Foo[T] = new Foo(t) 
  } 
 
  def boxed: Foo[Int] = Foo.create(1)

Here, the definition of create is analogous to the child class from the inheritance example. Instances of Foo wrapping a primitive that are instantiated from the create method will be boxed. The following bytecode demonstrates how boxed leads to heap allocations:

public highperfscala.specialization.MethodReturnTypes$Foo<java.lang.Object> boxed(); 
    Code: 
       0: getstatic     #19  // Field highperfscala/specialization/MethodReturnTypes$Foo$.MODULE$:Lhighperfscala/specialization/MethodReturnTypes$Foo$; 
       3: iconst_1 
       4: invokestatic  #25  // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 
       7: invokevirtual #29  // Method highperfscala/specialization/MethodReturnTypes$Foo$.create:(Ljava/lang/Object;)Lhighperfscala/specialization/MethodReturnTypes$Foo; 
      10: areturn 
The solution is to apply the @specialized annotation at the call site, as follows:
def createSpecialized[@specialized T](t: T): Foo[T] = new Foo(t)

The solution is to apply the @specialized annotation at the call site, as follows:

def createSpecialized[@specialized T](t: T): Foo[T] = new Foo(t)

One final interesting scenario is when specialization is used with multiple types and one of the types extends AnyRef or is a value class. To illustrate this scenario, consider the following example:

case class ShareCount(value: Int) extends AnyVal 
  case class ExecutionCount(value: Int) 
 
  class Container2[@specialized X, @specialized Y](x: X, y: Y) 
 
  def shareCount = new Container2(ShareCount(1), 1) 
 
  def executionCount = new Container2(ExecutionCount(1), 1) 
 
  def ints = new Container2(1, 1)

In this example, which methods do you expect to box the second argument to Container2? For brevity, we omit the bytecode, but you can easily inspect it yourself. As it turns out, shareCount and executionCount box the integer. The compiler does not generate a specialized version of Container2 that accepts a primitive integer and a value extending AnyVal (for example, ExecutionCount). The shareCount variable also causes boxing due to the order in which the compiler removes the value class type information from the source code. In both scenarios, the workaround is to define a case class that is specific to a set of types (for example, ShareCount and Int). Removing the generics allows the compiler to select the primitive types.

The conclusion to draw from these examples is that specialization requires extra focus to be used throughout an application without boxing. As the compiler is unable to infer scenarios where you accidentally forgot to apply the @specialized annotation, it fails to raise a warning. This places the onus on you to be vigilant about profiling and inspecting bytecode to detect scenarios where specialization is incidentally dropped.

To combat some of the shortcomings that specialization brings, there is a compiler plugin under active development, named miniboxing, at http://scala-miniboxing.org/. This compiler plugin applies a different strategy that involves encoding all primitive types into a long value and carrying metadata to recall the original type. For example, boolean can be represented in long using a single bit to signal true or false. With this approach, performance is qualitatively similar to specialization while producing orders of magnitude for fewer classes for large permutations. Additionally, miniboxing is able to more robustly handle inheritance scenarios and can warn when boxing will occur. While the implementations of specialization and miniboxing differ, the end user usage is quite similar. Like specialization, you must add appropriate annotations to activate the miniboxing plugin. To learn more about the plugin, you can view the tutorials on the miniboxing project site.

The extra focus to ensure specialization produces heap-allocation free code is worthwhile because of the performance wins in performance-sensitive code. To drive home the value of specialization, consider the following microbenchmark that computes the cost of a trade by multiplying share count with execution price. For simplicity, primitive types are used directly instead of value classes. Of course, in production code this would never happen:

@BenchmarkMode(Array(Throughput)) 
@OutputTimeUnit(TimeUnit.SECONDS) 
@Warmup(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS) 
@Measurement(iterations = 30, time = 10, timeUnit = TimeUnit.SECONDS) 
@Fork(value = 1, warmups = 1, jvmArgs = Array("-Xms1G", "-Xmx1G")) 
class SpecializationBenchmark { 
 
  @Benchmark 
  def specialized(): Double = 
    specializedExecution.shareCount.toDouble * specializedExecution.price 
 
  @Benchmark 
  def boxed(): Double = 
    boxedExecution.shareCount.toDouble * boxedExecution.price 
} 
 
object SpecializationBenchmark { 
  class SpecializedExecution[@specialized(Int) T1, @specialized(Double) T2]( 
    val shareCount: Long, val price: Double) 
  class BoxingExecution[T1, T2](val shareCount: T1, val price: T2) 
 
  val specializedExecution: SpecializedExecution[Int, Double] = 
    new SpecializedExecution(10l, 2d) 
  val boxedExecution: BoxingExecution[Long, Double] = new BoxingExecution(10l, 2d) 
}

In this benchmark, two versions of a generic execution class are defined. SpecializedExecution incurs zero boxing when computing the total cost because of specialization, while BoxingExecution requires object boxing and unboxing to perform the arithmetic. The microbenchmark is invoked with the following parameterization:

sbt 'project chapter3' 'jmh:run SpecializationBenchmark -foe true'

We configure this JMH benchmark via annotations that are placed at the class level in the code. Annotations have the advantage of setting proper defaults for your benchmark, and simplifying the command-line invocation. It is still possible to override the values in the annotation with command-line arguments. We use the -foe command-line argument to enable failure on error because there is no annotation to control this behavior. In the rest of this book, we will parameterize JMH with annotations and omit the annotations in the code samples because we always use the same values.

The results are summarized in the following table:

Benchmark

Throughput (ops per second)

Error as percentage of throughput

boxed

251,534,293.11

±2.23

specialized

302,371,879.84

±0.87

This microbenchmark indicates that the specialized implementation yields approximately 17% higher throughput. By eliminating boxing in a critical section of the code, there is an order of magnitude performance improvement available through judicious usage of specialization. For performance-sensitive arithmetic, this benchmark provides justification for the extra effort that is required to ensure that specialization is applied properly.

Summary

This article talk about different Scala constructs and features. It also explained different features and how they get compiled with bytecode.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Scala High Performance Programming

Explore Title
comments powered by Disqus