The Option data type

The Option data type is used pervasively throughout the Scala standard library. Like pattern matching, it is a language feature often adopted early by Scala beginners. The Option data type provides an elegant way to transform and handle values that are not required, doing away with null checks often found in Java code. We assume you understand and appreciate the value that Option brings to writing Scala in the functional paradigm, so we will not reiterate its benefits further. Instead, we focus on analyzing its bytecode representation to drive performance insights.

Bytecode representation

Inspecting the Scala source code, we see that Option is implemented as an abstract class with the possible outcomes, Some and None, extending Option to encode this relationship. The class definitions with implementations removed are shown for convenience in the following code snippet:

sealed abstract class Option[+A] extends Product with Serializable 
final case class Some[+A](x: A) extends Option[A] 
case object None extends Option[Nothing] 

Studying the definitions, we can infer several points about the bytecode representation. Focusing on Some, we note the absence of extending AnyVal. As Option is implemented using inheritance, Some cannot be a value class due to limitations that we covered in the Value class section. This limitation implies that there is an allocation for each value wrapped as a Some instance. Furthermore, we observe that Some is not specialized. From our examination of specialization, we realize that primitives wrapped as Some instances will be boxed. Here is a simple example to illustrate both concerns:

def optionalInt(i: Int): Option[Int] = Some(i) 

In this trivial example, an integer is encoded as a Some instance to be used as an Option data type. The following bytecode is produced:

  public scala.Option<java.lang.Object> optionalInt(int); 
    Code: 
       0: new           #16  // class scala/Some 
       3: dup 
       4: iload_1 
       5: invokestatic  #22  // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 
       8: invokespecial #25  // Method scala/Some."<init>":(Ljava/lang/Object;)V 
      11: areturn 

As we expected, there is an object allocation to create a Some instance, followed by the boxing of the provided integer to construct the Some instance.

The None instance is a simpler case to understand from the bytecode perspective. As None is defined as a Scala object, there is no instantiation cost to create a None instance. This makes sense because None represents a scenario where there is no state to maintain.

Note

Have you ever considered how the single value, None, represents no value for all the types? The answer lies in understanding the Nothing type. The Nothing type extends all other types, which allows None to be a subtype of any  A type. For more insight into the Scala type hierarchy, view this useful Scala language tutorial at http://docs.scala-lang.org/tutorials/tour/unified-types.html.

Performance considerations

In any non-performance-sensitive environments, it is sensible to default to using Option to represent values that are not required. In a performance-sensitive area of the code, the choice becomes more challenging and less clear-cut. Particularly in performance-sensitive code, you must first optimize for correctness and then performance. We suggest always implementing the first version of the problem that you are modeling in the most idiomatic style, which is to say, using Option. Using the awareness gained from the bytecode representation of Some, the logical next step is to profile in order to determine whether or not Option use is the bottleneck. In particular, you are focusing on memory allocation patterns and garbage collection costs. In our experience, there are often other overhead sources present in the code that are more costly than Option use. Examples include inefficient algorithm implementation, a poorly constructed domain model, or inefficient use of system resources. If, in your case, you have eliminated other sources of inefficiency and are positive that Option is the source of your performance woes, then you need to take further steps.

An incremental step towards improved performance might include removing use of the Option higher-order functions. On the critical path, there can be significant cost savings by replacing higher-order functions with inlined equivalents. Consider the following trivial example that transforms an Option data type into a String data type:

Option(10).fold("no value")(i => s"value is $i") 

On the critical path, the following change may yield substantive improvements:

val o = Option(10) 
if (o.isDefined) s"value is ${o.get} else "no value" 

Replacing the fold operation with an if statement saves the cost of creating an anonymous function. It bears repeating that this type of change should only ever be considered after extensive profiling reveals Option usage to be the bottleneck. While this type of code change is likely to improve your performance, it is verbose and unsafe due to usage of o.get. When this technique is used judiciously, you may be able to retain use of the Option data type in critical path code.

If replacing higher-order Option function use with inlined and unsafe equivalents fails to sufficiently improve performance, then you need to consider more drastic measures. At this point, profiling should reveal that Option memory allocation is the bottleneck, preventing you from reaching your performance goals. Faced with this scenario, you have two options (pun intended!) to explore, both of which involve a high cost in terms of time to implement.

One way to proceed is to admit that, for the critical path, Option is unsuitable and must be removed from the type signatures and replaced with null checks. This is the most performant approach, but it brings significant maintenance costs because you and all other team members working on the critical path must be cognizant of this modeling decision. If you choose to proceed this way, define clear boundaries for the critical path to isolate null checks to the smallest possible region of the code. In the next section, we explore a second approach that involves building a new data type that leverages the knowledge that we gained in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset