Specialization

To understand the significance of specialization, it is important to first grasp the concept of object boxing. The JVM defines primitive types (boolean, byte, char, float, int, long, short, and double) that are stack-allocated rather than heap-allocated. When a generic type is introduced, for example, scala.collection.immutable.List, the JVM references an object equivalent, instead of a primitive type. In this example, an instantiated list of integers would be heap-allocated objects rather than integer primitives. The process of converting a primitive to its object equivalent is called boxing, and the reverse process is called unboxing. Boxing is a relevant concern for performance-sensitive programming because boxing involves heap allocation. In performance-sensitive code that performs numerical computations, the cost of boxing and unboxing can can create significant performance slowdowns. Consider the following example to illustrate boxing overhead:

List.fill(10000)(2).map(_* 2) 

Creating the list via fill yields 10,000 heap allocations of the integer object. Performing the multiplication in map requires 10,000 unboxings to perform multiplication and then 10,000 boxings to add the multiplication result into the new list. From this simple example, you can imagine how critical section arithmetic will be slowed down due to boxing or unboxing operations.

As shown in Oracle's tutorial on boxing at https://docs.oracle.com/javase/tutorial/java/data/autoboxing.html, boxing in Java and also in Scala happens transparently. This means that, without careful profiling or bytecode analysis, it is difficult to discern where you are paying the cost for object boxing. To ameliorate this problem, Scala provides a feature named specialization. Specialization refers to the compile-time process of generating duplicate versions of a generic trait or class that refer directly to a primitive type instead of the associated object wrapper. At runtime, the compiler-generated version of the generic class (or, as it is commonly referred to, the specialized version of the class) is instantiated. This process eliminates the runtime cost of boxing primitives, which means that you can define generic abstractions while retaining the performance of a handwritten, specialized implementation.

Bytecode representation

Let's look at a concrete example to better understand how the specialization process works. Consider a naive, generic representation of the number of shares purchased, as follows:

case class ShareCount[T](value: T) 

For this example, let's assume that the intended usage is to swap between an integer or long representation of ShareCount. With this definition, instantiating a long-based ShareCount instance incurs the cost of boxing, as follows:

def newShareCount(l: Long): ShareCount[Long] = ShareCount(l) 

This definition translates to the following bytecode:

  public highperfscala.specialization.Specialization$ShareCount<java.lang.Object> newShareCount(long); 
    Code: 
       0: new           #21  // class orderbook/Specialization$ShareCount 
       3: dup 
       4: lload_1 
       5: invokestatic  #27  // Method scala/runtime/BoxesRunTime.boxToLong:(J)Ljava/lang/Long; 
       8: invokespecial #30  // Method orderbook/Specialization$ShareCount."<init>":(Ljava/lang/Object;)V 
      11: areturn 

In the preceding bytecode, it is clear at instruction 5 that the primitive long value is boxed before instantiating the ShareCount instance. By introducing the @specialized annotation, we are able to eliminate the boxing by having the compiler provide an implementation of ShareCount that works with primitive long values. It is possible to specify which types you wish to specialize by supplying a set of types. As defined in the Specializables trait (http://www.scala-lang.org/api/current/index.html#scala.Specializable"), you are able to specialize for all JVM primitives, as well as, Unit and AnyRef. For our example, let's specialize ShareCount for integers and longs, as follows:

case class ShareCount[@specialized(Long, Int) T](value: T) 

With this definition, the bytecode now becomes the following:

  public highperfscala.specialization.Specialization$ShareCount<java.lang.Object> newShareCount(long); 
    Code: 
       0: new           #21  // class highperfscala.specialization/Specialization$ShareCount$mcJ$sp 
       3: dup 
       4: lload_1 
       5: invokespecial #24  // Method highperfscala.specialization/Specialization$ShareCount$mcJ$sp."<init>":(J)V 
       8: areturn 

The boxing disappears and is curiously replaced with a different class name, ShareCount $mcJ$sp. This is because we are invoking the compiler-generated version of ShareCount that is specialized for long values. By inspecting the output of javap, we see that the specialized class generated by the compiler is a subclass of ShareCount:

 public class highperfscala.specialization.Specialization$ShareCount$mcI$sp extends highperfscala.specialization.Specialization$ShareCount<java .lang.Object> 

Bear this specialization implementation detail in mind as we turn to the Performance considerations section. The use of inheritance forces tradeoffs to be made in more complex use cases.

Performance considerations

At first glance, specialization appears to be a simple panacea for JVM boxing. However, there are several caveats to consider when using specialization. A liberal use of specialization leads to significant increases in compile time and resulting code size. Consider specializing Function3, which accepts three arguments as input and produces one result. To specialize four arguments across all types (that is, ByteShortIntLongCharFloatDoubleBooleanUnit, and AnyRef) yields 10^4 or 10,000 possible permutations. For this reason, the standard library conserves the application of specialization. In your own use cases, consider carefully which types you wish to specialize. If we specialize Function3 only for Int and Long, the number of generated classes shrinks to 2^4 or 16. Specialization involving inheritance requires extra attention because it is trivial to lose specialization when extending a generic class. Consider the following example:

  class ParentFoo[@specialized T](t: T) 
  class ChildFoo[T](t: T) extends ParentFoo[T](t) 
 
  def newChildFoo(i: Int): ChildFoo[Int] = new ChildFoo[Int](i) 

In this scenario, you likely expect that ChildFoo is defined with a primitive integer. However, as ChildFoo does not mark its type with the @specialized annotation, zero specialized classes are created. Here is the bytecode to prove it:

  public highperfscala.specialization.Inheritance$ChildFoo<java.lang.Object> newChildFoo(int); 
    Code: 
       0: new           #16  // class highperfscala/specialization/Inheritance$ChildFoo 
       3: dup 
       4: iload_1 
       5: invokestatic  #22  // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 
       8: invokespecial #25  // Method highperfscala/specialization/Inheritance$ChildFoo."<init>":(Ljava/lang/Object;)V 
      11: areturn 

The next logical step is to add the @specialized annotation to the definition of ChildFoo. In doing so, we stumble across a scenario where the compiler warns about the use of specialization, as follows:

class ParentFoo must be a trait. Specialized version of class ChildFoo will inherit generic highperfscala.specialization.Inheritance.ParentFoo[Boolean] 
class ChildFoo[@specialized T](t: T) extends ParentFoo[T](t) 

The compiler indicates that you have created a diamond inheritance problem, where the specialized versions of ChildFoo extend both ChildFoo and the associated specialized version of ParentFoo. This issue can be resolved by modeling the problem with a trait, as follows:

  trait ParentBar[@specialized T] { 
    def t(): T 
  } 
 
  class ChildBar[@specialized T](val t: T) extends ParentBar[T] 
 
  def newChildBar(i: Int): ChildBar[Int] = new ChildBar(i) 

This definition compiles using a specialized version of ChildBar, as we originally were hoping for, as seen in the following code:

  public highperfscala.specialization.Inheritance$ChildBar<java.lang.Object> newChildBar(int); 
    Code: 
       0: new           #32  // class highperfscala/specialization/Inheritance$ChildBar$mcI$sp 
       3: dup 
       4: iload_1 
       5: invokespecial #35  // Method highperfscala/specialization/Inheritance$ChildBar$mcI$sp."<init>":(I)V 
       8: areturn 

An analogous and equally error-prone scenario is when a generic method is defined around a specialized type. Consider the following definition:

  class Foo[T](t: T) 
 
  object Foo { 
    def create[T](t: T): Foo[T] = new Foo(t) 
  } 
 
  def boxed: Foo[Int] = Foo.create(1) 

Here, the definition of create is analogous to the child class from the inheritance example. Instances of Foo wrapping a primitive that are instantiated from the create method will be boxed. The following bytecode demonstrates how boxed leads to heap allocations:

  public highperfscala.specialization.MethodReturnTypes$Foo<java.lang.Object> boxed(); 
    Code: 
       0: getstatic     #19  // Field highperfscala/specialization/MethodReturnTypes$Foo$.MODULE$:Lhighperfscala/specialization/MethodReturnTypes$Foo$; 
       3: iconst_1 
       4: invokestatic  #25  // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 
       7: invokevirtual #29  // Method highperfscala/specialization/MethodReturnTypes$Foo$.create:(Ljava/lang/Object;)Lhighperfscala/specialization/MethodReturnTypes$Foo; 
      10: areturn 

The solution is to apply the @specialized annotation at the call site, as follows:

def createSpecialized[@specialized T](t: T): Foo[T] = new Foo(t) 

One final interesting scenario is when specialization is used with multiple types and one of the types extends AnyRef or is a value class. To illustrate this scenario, consider the following example:

  case class ShareCount(value: Int) extends AnyVal 
  case class ExecutionCount(value: Int) 
 
  class Container2[@specialized X, @specialized Y](x: X, y: Y) 
 
  def shareCount = new Container2(ShareCount(1), 1) 
 
  def executionCount = new Container2(ExecutionCount(1), 1) 
 
  def ints = new Container2(1, 1) 

In this example, which methods do you expect to box the second argument to Container2? For brevity, we omit the bytecode, but you can easily inspect it yourself. As it turns out, shareCount and executionCount box the integer. The compiler does not generate a specialized version of Container2 that accepts a primitive integer and a value extending AnyVal (for example, ExecutionCount). The shareCount method also causes boxing due to the order in which the compiler removes the value class type information from the source code. In both scenarios, the workaround is to define a case class that is specific to a set of types (for example, ShareCount and Int). Removing the generics allows the compiler to select the primitive types.

The conclusion to draw from these examples is that specialization requires extra focus to be used throughout an application without boxing. As the compiler is unable to infer scenarios where you accidentally forgot to apply the @specialized annotation, it fails to raise a warning. This places the onus on you to be vigilant about profiling and inspecting bytecode to detect scenarios where specialization is incidentally dropped.

Note

To combat some of the shortcomings that specialization brings, there is a compiler plugin under active development, named miniboxing, at http://scala-miniboxing.org/. This compiler plugin applies a different strategy that involves encoding all primitive types into a long value and carrying metadata to recall the original type. For example, boolean can be represented in a long using a single bit to signal true or false. With this approach, performance is qualitatively similar to specialization while producing orders of magnitude fewer classes for large permutations. Additionally, miniboxing is able to more robustly handle inheritance scenarios and can warn when boxing will occur. While the implementations of specialization and miniboxing differ, the end user usage is quite similar. Like specialization, you must add appropriate annotations to activate the miniboxing plugin. To learn more about the plugin, you can view the tutorials on the miniboxing project site.

The extra focus to ensure specialization produces heap allocation free code is worthwhile because of the performance wins in performance-sensitive code. To drive home the value of specialization, consider the following microbenchmark that computes the cost of a trade by multiplying share count with execution price. For simplicity, primitive types are used directly instead of value classes. Of course, in production code this would never happen:

@BenchmarkMode(Array(Throughput)) 
@OutputTimeUnit(TimeUnit.SECONDS) 
@Warmup(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS) 
@Measurement(iterations = 30, time = 10, timeUnit = TimeUnit.SECONDS) 
@Fork(value = 1, warmups = 1, jvmArgs = Array("-Xms1G", "-Xmx1G")) 
class SpecializationBenchmark { 
 
  @Benchmark 
  def specialized(): Double = 
    specializedExecution.shareCount.toDouble * specializedExecution.price 
 
  @Benchmark 
  def boxed(): Double = 
    boxedExecution.shareCount.toDouble * boxedExecution.price 
} 
 
object SpecializationBenchmark { 
  class SpecializedExecution[@specialized(Int) T1, @specialized(Double) T2]( 
    val shareCount: Long, val price: Double) 
  class BoxingExecution[T1, T2](val shareCount: T1, val price: T2) 
 
  val specializedExecution: SpecializedExecution[Int, Double] = 
    new SpecializedExecution(10l, 2d) 
  val boxedExecution: BoxingExecution[Long, Double] = new BoxingExecution(10l, 2d) 
} 

In this benchmark, two versions of a generic execution class are defined. SpecializedExecution incurs zero boxing when computing the total cost because of specialization, while BoxingExecution requires object boxing and unboxing to perform the arithmetic. The microbenchmark is invoked with the following parameterization:

sbt 'project chapter3' 'jmh:run SpecializationBenchmark -foe true'

Note

We configure this JMH benchmark via annotations that are placed at the class level in the code. This is different from what we saw in Chapter 2, Measuring Performance on the JVM, where we used command-line arguments. Annotations have the advantage of setting proper defaults for your benchmark, and simplifying the command-line invocation. It is still possible to override the values in the annotation with command-line arguments. We use the  -foe command-line argument to enable failure on error because there is no annotation to control this behavior. In the rest of this book, we will parameterize JMH with annotations and omit the annotations in the code samples because we always use the same values.

The results are summarized in the following table:

Benchmark

Throughput (ops per second)

Error as percentage of throughput

boxed

251,534,293.11

±2.23

specialized

302,371,879.84

±0.87

This microbenchmark indicates that the specialized implementation yields approximately 17% higher throughput. By eliminating boxing in a critical section of the code, there is an order of magnitude performance improvement available through the judicious usage of specialization. For performance-sensitive arithmetic, this benchmark provides justification for the extra effort that is required to ensure that specialization is applied properly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset