For programmers who are new to Scala, pattern matching is often one of the language features that is the simplest to understand, but it also unlocks new ways to think about writing software. This powerful mechanism enables you to match on disparate types with compile-time safety using an elegant syntax. Given how central this technique is to writing Scala in the functional paradigm, it is important to consider its runtime overhead.
Let's consider an example that involves order processing with an algebraic data type representing the possible sides of an order:
sealed trait Side case object Buy extends Side case object Sell extends Side def handleOrder(s: Side): Boolean = s match { case Buy => true case Sell => false }
The terminology algebraic data type (ADT) is a more formal way of referring to a sealed trait and its cases. For example, Side
, Buy
, and Sell
form an ADT. For our purposes, an ADT defines a closed set of cases. For Side
, the enclosed cases are Buy
and Sell
. The sealed modifier provides closed set semantics because it disallows the extension of Side
in separate source files. The closed set semantics implied by an ADT is what allows the compiler to infer whether or not a pattern match statement is exhaustive. If you are interested in studying another example of an ADT, view the order book commands defined in Chapter
2, Measuring Performance on the JVM.
As shown in the following bytecode, pattern matching is desugared into a set of if statements:
public boolean handleOrder(highperfscala.patternmatch.PatternMatching$Side); Code: 0: aload_1 1: astore_2 2: getstatic #148 // Field highperfscala.patternmatch/PatternMatching$Buy$.MODULE$:Lhighperfscala.patternmatch/PatternMatching$Buy$; 5: aload_2 6: invokevirtual #152 // Method java/lang/Object.equals:(Ljava/lang/Object;)Z 9: ifeq 17 12: iconst_1 13: istore_3 14: goto 29 17: getstatic #157 // Field highperfscala.patternmatch/PatternMatching$Sell$.MODULE$:Lhighperfscala.patternmatch/PatternMatching$Sell$; 20: aload_2 21: invokevirtual #152 // Method java/lang/Object.equals:(Ljava/lang/Object;)Z 24: ifeq 31 27: iconst_0 28: istore_3 29: iload_3 30: ireturn 31: new #159 // class scala/MatchError 34: dup 35: aload_2 36: invokespecial #160 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V 39: athrow
Inspecting the bytecode shows how the Scala compiler is able to desugar pattern match expressions into a set of efficient if statements with the ifeq
instructions at the 9
and 24
indexes. This an illustrative example of how Scala is able to provide expressive and elegant first-class language features that retain efficient bytecode equivalents.
Pattern matching against values that contain state (for example, a case class) imposes additional runtime costs that are not immediately clear when looking at the Scala source code. Consider the following extension to the previous example that introduces state:
sealed trait Order case class BuyOrder(price: Double) extends Order case class SellOrder(price: Double) extends Order def handleOrder(o: Order): Boolean = o match { case BuyOrder(price) if price > 2.0 => true case BuyOrder(_) => false case SellOrder(_) => false }
Here, the example is more complicated because the instance type must be identified for all three cases with the added complexity of a predicate on the BuyOrder
price in the first case. In the following, we look at a snippet of the scalac
output with all Scala-specific features removed:
case10(){ if (x1.$isInstanceOf[highperfscala.patternmatch.PatternMatching$BuyOrder]()) { rc8 = true; x2 = (x1.$asInstanceOf[highperfscala.patternmatch.PatternMatching$BuyOrder](): highperfscala.patternmatch.PatternMatching$BuyOrder); { val price: Double = x2.price(); if (price.>(2.0)) matchEnd9(true) else case11() } } else case11() }; case11(){ if (rc8) matchEnd9(false) else case12() };
This desugaring illustrates several interesting points about the Scala compiler. Identifying the type of Order
utilizes isInstanceOf
from java.lang.Object
, which maps to the instanceOf
bytecode instruction. Casting, by way of asInstanceOf
, coerces the Order
into either a BuyOrder
price or a SellOrder
. The first takeaway is that pattern matching types carrying state incurs the runtime cost of type-checking and casting.
A second insight is that the Scala compiler is able to optimize away the instance checking for the second pattern match by creating a Boolean variable named rc8
to determine whether a BuyOrder
was discovered. This neat optimization is simple to handwrite, but it removes the elegance and simplicity of pattern matching. This is another example of how the compiler is able to produce efficient bytecode from expressive, high-level code.
From the preceding examples, it is now clear that pattern matches are compiled to if statements. One performance consideration for critical path code is the ordering of pattern match statements. If your code has five pattern match statements and the fifth pattern is the most frequently accessed, then your code is paying the price of always evaluating four other branches. Let's devise a JMH microbenchmark that estimates the linear access cost of pattern matching. Each benchmark defines ten pattern matches using different values (for example, the value class, the integer literal, the case class, and so on). For each benchmark, the matched index is swept to show the cost of accessing the first, the fifth, and the the tenth pattern match statement. Here is the benchmark definition:
class PatternMatchingBenchmarks { @Benchmark def matchIntLiterals(i: PatternMatchState): Int = i.matchIndex match { case 1 => 1 case 2 => 2 case 3 => 3 case 4 => 4 case 5 => 5 case 6 => 6 case 7 => 7 case 8 => 8 case 9 => 9 case 10 => 10 } @Benchmark def matchIntVariables(ii: PatternMatchState): Int = ii.matchIndex match { case `a` => 1 case `b` => 2 case `c` => 3 case `d` => 4 case `e` => 5 case `f` => 6 case `g` => 7 case `h` => 8 case `i` => 9 case `j` => 10 } @Benchmark def matchAnyVal(i: PatternMatchState): Int = CheapFoo(i.matchIndex) match { case CheapFoo(1) => 1 case CheapFoo(2) => 2 case CheapFoo(3) => 3 case CheapFoo(4) => 4 case CheapFoo(5) => 5 case CheapFoo(6) => 6 case CheapFoo(7) => 7 case CheapFoo(8) => 8 case CheapFoo(9) => 9 case CheapFoo(10) => 10 } @Benchmark def matchCaseClass(i: PatternMatchState): Int = ExpensiveFoo(i.matchIndex) match { case ExpensiveFoo(1) => 1 case ExpensiveFoo(2) => 2 case ExpensiveFoo(3) => 3 case ExpensiveFoo(4) => 4 case ExpensiveFoo(5) => 5 case ExpensiveFoo(6) => 6 case ExpensiveFoo(7) => 7 case ExpensiveFoo(8) => 8 case ExpensiveFoo(9) => 9 case ExpensiveFoo(10) => 10 } } object PatternMatchingBenchmarks { case class CheapFoo(value: Int) extends AnyVal case class ExpensiveFoo(value: Int) private val (a, b, c, d, e, f, g, h, i, j) = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) @State(Scope.Benchmark) class PatternMatchState { @Param(Array("1", "5", "10")) var matchIndex: Int = 0 } }
Performance was evaluated by running 30 trials, each lasting 10 seconds with three warm-up trials, each lasting 5 seconds. Here is the benchmark invocation:
sbt 'project chapter3' 'jmh:run PatternMatchingBenchmarks -foe true'
The results are summarized in the following table:
Benchmark |
Index to match |
Throughput (ops per second) |
Error as percentage of throughput |
Throughput change as percentage of base run |
|
1 |
350,568,900.12 |
±3.02 |
0 |
|
5 |
291,126,287.45 |
±2.63 |
-17 |
|
10 |
238,326,567.59 |
±2.95 |
-32 |
|
1 |
356,567,498.69 |
±3.66 |
0 |
|
5 |
287,597,483.22 |
±3.50 |
-19 |
|
10 |
234,989,504.60 |
±2.60 |
-34 |
|
1 |
304,242,630.15 |
±2.95 |
0 |
|
5 |
314,588,776.07 |
±3.70 |
3 |
|
10 |
285,227,574.79 |
±4.33 |
-6 |
|
1 |
332,377,617.36 |
±3.28 |
0 |
|
5 |
263,835,356.53 |
±6.53 |
-21 |
|
10 |
170,460,049.63 |
±4.20 |
-49 |
The last column takes the first trial of each benchmark when matching the first index as the base case. For trials matching the fifth and tenth indexes, the relative performance drop is displayed. In every case, except matching the fifth index of literal integers, throughput degrades nearly linearly as deeper indexes are matched. The one trial that defies this pattern is the trial matching literal integers. In this trial, performance improves relative to the first index when accessing the fifth index. Upon inspection of the bytecode, we discover that this scenario produces a jump table instead of a set of if statements. Here is a snippet from the generated bytecode:
6: tableswitch { // 1 to 10 1: 113 2: 109 3: 105 4: 101 5: 97 6: 92 7: 87 8: 82 9: 77 10: 72 default: 60 }
This bytecode snippet demonstrates that the JVM converts a pattern match on integer literals to a jump table using the tableswitch
instruction. This is a constant time operation rather than a linear traversal of if statements. Given that the observed error is several percentage points and the observed differences across the three trials are roughly several percentage points, we can deduce that the linear access cost does not apply to this scenario. Instead, matching literal integers at the Nth index has a constant access cost due to the generated jump table. In contrast, matching an integer variable proves to be nearly twice as expensive at the tenth index. The clear takeaway from this experiment is that, for any pattern match that is generating a series of if statements, there is a linear cost to access the Nth pattern match statement. If you pattern match at least three cases in performance-sensitive code, consider reviewing the code to determine whether the statement order matches the access frequency.
Do you have examples of pattern matching containing only two patterns? In scenarios involving only two pattern match statements that directly match a value, the compiler is able to generate an efficient jump table. When matching primitive literals (for example, string literals or integer literals), the compiler is able to generate jump tables for larger pattern matches. Analogous to the @tailrec
annotation, Scala defines a @switch
annotation for you to indicate to the compiler that you expect this pattern match statement to be compiled to a jump table. If the compiler is unable to generate a jump table, and instead it produces a series of if statements, then a warning will be issued. Like the @tailrec
annotation, the compiler will apply the jump table heuristic whether you provide the @switch
annotation or not. In practice, we do not often use this annotation because of its limited applicability, but it is worthwhile to be aware of its existence. The following is an example of an annotated pattern match that compiles to a jump table:
def processShareCount(sc: ShareCount): Boolean = (sc: @switch) match { case ShareCount(1) => true case _ => false }