Chapter 4. Exploring the Collection API

In this chapter, we return to MVT in order to take on challenges that span multiple MVT teams. The market data team requires improved critical path order book performance to handle increased cancel request volume. The data science team wants better ad hoc data analysis tools to research trading strategies. Everyone has a problem that had to be solved yesterday. That's the start-up lifestyle!

We use the functional paradigm, our existing knowledge, and the Scala collections API to our advantage to solve these challenges. The power of the Scala language and its collections API allow you to approach problems in ways that you may not have thought possible before. As we work through these challenges and encounter new Scala collection usage, we detail collection implementation and tradeoffs to consider. We will consider the following collections in this chapter:

  • List
  • TreeMap
  • Queue
  • Set
  • Vector
  • Array

High-throughput systems – improving the order book

In Chapter 1, The Road to Performance, you met MVT's head trader, Dave, under tense circumstances. The financial markets underwent a period of extreme volatility that exposed a weakness in the order book design. After speaking to Dave, you learned that in volatile markets, order volume is dominated by cancels because traders are reacting to quickly changing market conditions. Through order book benchmarking and profiling, you confirmed the suspicion that under high volume, cancel performance causes high order book response latency.

Although the market volatility that caused trading losses has passed, Dave recognizes the risk that future volatility poses for MVT's returns. Dave wants to invest engineering effort into making the order book more performant when cancelations frequently occur. By working with the data science team, Dave analyzed historical order book activity over a three month period and discovered interesting market characteristics. He shares with you that in the three months analyzed, on a per trading day basis, cancels comprised, on average, 70% of order book commands. The analysis also revealed that on the most volatile market days, cancel activity represents about 85% of order book activity. Known for his puns, Dave concludes with, "Now, you know everything I know. Like the order book, we are counting on you to execute!"

Understanding historical trade-offs – list implementation

Excited to improve order book performance, your first step is to familiarize yourself with the order book implementation. As you open up the order book repository, you ping Gary, a fellow engineer who has prior order book development experience. As Gary knows the history of order book development, he tells you to check out ListOrderBook. "This was our first attempt at modeling the order book. I think you can learn from our design by seeing its first incarnation," he adds, "Once you understand the implementation, check out QueueOrderBook. That's the next version of the order book. You profiled an older iteration of this implementation when we had the volatility wave. Let me know if you have any questions!" After thanking him, you dig into the repository to find ListOrderBook.

The ListOrderBook class defines the following state to manage buys (bids) and sells (offers):

case class ListOrderBook( 
  bids: TreeMap[Price, List[BuyLimitOrder]], 
  offers: TreeMap[Price, List[SellLimitOrder]]) { 
  def bestBid: Option[BuyLimitOrder] =  
    ??? // hidden for brevity 
  def bestOffer: Option[SellLimitOrder] =  
    ??? // hidden for brevity 
} 

To refresh our memory, here are definitions of PriceBuyLimitOrder, and SellLimitOrder:

sealed trait LimitOrder { 
  def id: OrderId 
  def price: Price 
} 
case class BuyLimitOrder(id: OrderId, price: Price) extends LimitOrder 
case class SellLimitOrder(id: OrderId, price: Price) extends LimitOrder 
case class Price(value: BigDecimal) 

The LimitOrder is an algebraic data type (ADT) that represents the two possible order sides. The Price class is a strongly-typed wrapper for BigDecimal. Recalling the performance boost that value classes provide, you modify the definition of Price, as follows:

case class Price(value: BigDecimal) extends AnyVal 

The ListOrderBook class uses two Scala collection types to maintain its state: List and TreeMap. Let's have a deeper look at these data structures to understand the tradeoffs that they present.

List

Scala implements List as an immutable singly-linked list. A List is an ordered collection of elements of the same type. A List is a sealed abstract class with two implementations: Nil, which represents the empty list, and :: (often called cons), which is used to represent an element and a tail. To make things more concrete, let's look at some pseudocode, which is close to the actual implementation:

sealed trait List[+A] 
case object Nil extends List[Nothing] 
case class ::[A](head: A, tail: List[A]) extends List[A] 

A List of three integers can be constructed using the following notation:

val list = ::(1, ::(2, ::(3, Nil)))

Note

Note the plus sign in the definition of the List trait. The plus (+) sign indicates that List is covariant on its type parameter, A. Covariance allows you to express polymorphic constraints with generic types. To make this more concrete, consider the following definitions:

sealed trait Base

case class Impl(value: Int) extends Base

Here, a relationship is expressed between Base and Impl. The Impl class is a subtype of Base. When used with List, covariance allows us to express that List[Impl] is a subtype of List[Base]. Expressed with an example, covariance is what allows the following snippet to compile:

val bases: List[Base] = List[Impl](Impl(1))

Covariance belongs to the broader topic of variances. If you wish to learn more about variances in Scala, refer to this excellent blog post by Andreas Schroeder at https://blog.codecentric.de/en/2015/03/scala-type-system-parameterized-types-variances-part-1/.

Unlike most other Scala collections, List supports pattern matching on its content. This is a powerful way to write expressive code that handles multiple scenarios while retaining compile-time safety that all possible cases are handled. Consider the following snippet:

List(1,2,3,4) match { 
  case 1 :: x :: rest => println(s"second element: $x, rest: $rest") 
} 

In this simple pattern match, we are able to express several concerns. Here, 1 is 1x is 2, and rest is List(3,4). When compiled, this snippet elicits a compiler warning because the Scala compiler infers that there are possible List patterns that were unmatched (for example, empty List). Compiler-provided warnings minimize the chance of your forgetting to handle a valid input.

A List is optimized for prepend operations. Adding 0 to the previous list is as easy as doing this:

val list = ::(1, ::(2, ::(3, Nil))) 
val listWithZero = ::(0, list) 

This is a constant-time operation, and it has almost no memory cost, as List implements data sharing. In other words, the new list, listWithZero, is not a deep copy of list. Instead, it re-uses all its allocated elements and allocates only one new element, the cell containing 0:

List

In contrast to prepend operations, append operations (that is, adding an element to the end of the list) are computationally expensive because the entire List must be copied:

List

Given the poor append performance of List, you may wonder whether it is safe to use a map transform. A map transform occurs by applying a function to successive elements in the List, which can be logically represented by appending transformed values to a new List. To avoid this performance pitfall, List.map overrides the default implementation provided by the trait TraversableOnce to apply the transform using prepend operations. This provides improved List.map performance while retaining the same API. Overriding default behavior to provide a specialized implementation is a common Scala collections pattern. Constant time head operations make List ideal for algorithms involving last-in, first-out (LIFO) operations. For random access and first-in, first-out (FIFO) behaviors, you should employ List selectively.

In the next section, we investigate TreeMap. The TreeMap class is the implementation of the SortedMap trait that is used to maintain bids and offers.

TreeMap

The TreeMap class is a map that orders keys according to a provided ordering strategy. The following snippet of its class definition makes the ordering requirement clear:

class TreeMap[A, +B] private (tree: RB.Tree[A, B])(implicit val ordering: Ordering[A]) 

The Ordering class is a type class that defines a contract for the natural ordering of elements of the A type.

Note

If type classes are a concept that is new to you, we encourage you to read Daniel Westheide's well-written blog post on the topic at http://danielwestheide.com/blog/2013/02/06/the-neophytes-guide-to-scala-part-12-type-classes.html.

In ListOrderBook, we see that Price is the key. Looking at the companion object of Price, we see that the ordering is defined by delegating to the underlying BigDecimal type's ordering definition:

object Price { 
  implicit val ordering: Ordering[Price] = new Ordering[Price] { 
    def compare(x: Price, y: Price): Int = 
      Ordering.BigDecimal.compare(x.value, y.value) 
  } 
} 

The TreeMap class referenced by ListOrderBook, like List, is immutable. Immutability provides strong reasoning guarantees. We can be certain that there are no side effects because the effect of adding or removing a value from the map is always reflected as a new map.

The TreeMap class implementation is a special type of binary search tree, the red-black tree. This tree implementation provides logarithmic operation time for lookups, additions, and removals. You might be surprised to see TreeMap in place of HashMap. As documented in the Scala collections performance overview (http://docs.scala-lang.org/overviews/collections/performance-characteristics.html), HashMap provides constant time lookups, additions, and removals, which is faster than TreeMap. However, TreeMap offers superior performance when performing ordered traversals. For example, finding the largest key in the map can be done in logarithmic time with TreeMap, while this is done in linear time for HashMap. This difference is an indicator that the order book implementation requires efficient ordered Price traversals.

Adding limit orders

Coming back to the ListOrderBook implementation, we see the following partial method definition reflects the heart of the order book:

  def handle( 
    currentTime: () => EventInstant, 
    ob: ListOrderBook, 
    c: Command): (ListOrderBook, Event) = c match { 
    case AddLimitOrder(_, o) => ??? // hidden for brevity 
    case CancelOrder(_, id) => ??? // hidden for brevity 
  } 

Note

It might seem curious that a function is supplied as an argument to retrieve the current time. A potentially simpler way to achieve the same effect is to invoke System.currentTimeMillis(). The shortcoming of this approach is that accessing the system clock is a side-effect, which means that the function is no longer referentially transparent. By providing a function to retrieve the current time, we are able to control how this side-effect happens and produce repeatable test cases.

Given a Command, an order book instance, and a way to obtain the current time for event timestamps, an Event and a new state are produced. To refresh our memory, here are the commands the order book can process:

  sealed trait Command 
  case class AddLimitOrder(i: CommandInstant, o: LimitOrder) extends Command 
  case class CancelOrder(i: CommandInstant, id: OrderId) extends Command 

The following are the possible events created by processing commands:

  sealed trait Event 
  case class OrderExecuted(i: EventInstant, buy: Execution,  
    sell: Execution) extends Event 
  case class LimitOrderAdded(i: EventInstant) extends Event 
  case class OrderCancelRejected(i: EventInstant,  
    id: OrderId) extends Event 
  case class OrderCanceled(i: EventInstant,  
    id: OrderId) extends Event 

Let's focus on supporting the AddLimitOrder command to better understand the algorithmic properties of historical design choices. When adding a limit order, one of two outcomes is possible:

  • The incoming order price crosses the book resulting in OrderExecuted
  • The oncoming order rests on the book resulting in LimitOrderAdded

Deducing whether or not the order crosses the book requires looking at the best price on the opposing side. Returning to the definition of LimitOrderBook with complete implementation of bestBid and bestOffer, we see the following:

case class ListOrderBook( 
  bids: TreeMap[Price, List[BuyLimitOrder]], 
  offers: TreeMap[Price, List[SellLimitOrder]]) { 
  def bestBid: Option[BuyLimitOrder] = 
    bids.lastOption.flatMap(_._2.headOption) 
   
  def bestOffer: Option[SellLimitOrder] = 
    offers.headOption.flatMap(_._2.headOption) 
} 

The implementation shows that we are taking advantage of the logarithmic ordered search property of TreeMap. The best bid is the key with the highest price, which is the last value in the tree because the ordering is ascending. The best offer is the key with the lowest price, which is the first value in the tree.

Focusing specifically on the addition of a buy limit order and given the best offer, the following comparison occurs to determine whether the incoming buy order crosses the book or rests on the book:

orderBook.bestOffer.exists(buyOrder.price.value >= _.price.value)  
  match { 
          case true => ??? // cross the book 
          case false => ??? // rest on the book 
  } 

Let's first assume that the incoming buy order's price is lower than the best offer, which means the order is added to the book (that is, rests on the book). The question we are trying to answer is, "where in the book should the order be added?" The order book performs a logarithmic search to find the price level associated with the order price. From the definition of ListOrderBook, you know that each value in the map (the price level) is represented as a List of orders. Recalling a discussion with the head trader, Dave, you remember that orders within a price level are executed based on time priority. The first order added to a price level is the first order to be executed. Conceptually, a price level is a first-in, first-out (FIFO) queue. The implication is that adding an order to a price level is a linear time operation because the order is appended to the end. The following snippet confirms your hypothesis:

val orders = orderBook.bids.getOrElse(buyOrder.price, Nil) 
          orderBook.copy(bids = orderBook.bids + (buyOrder.price -> orders.:+(buyOrder))) -> 
            LimitOrderAdded(currentTime()) 

The snippet shows that adding a resting limit order to the book involves a linear time append operation to List of BuyLimitOrder. In your mind, you are beginning to wonder how MVT was able to trade profitably at all with this order book. Before leaping to this harsh judgment, you consider how crossing the book is handled.

Assuming that the incoming buy order's price is greater than or equal to the best offer price, then the buy order crosses the book, causing an execution. Time priority dictates that the first sell order received is executed against the incoming buy order, which translates to taking the first sell order in the price level. When generating an execution, you realize that modeling a price level with a List provides constant time performance. The following snippet shows how a price level is modified on a buy execution:

      case (priceLevel, (sell :: Nil)) => (orderBook.copy(offers = orderBook.offers - sell.price), 
        OrderExecuted(currentTime(), Execution(buy.id, sell.price), 
          Execution(sell.id, sell.price))) 
      case (_, (sell :: remainingSells)) => (orderBook.copy(offers = orderBook.offers + (sell.price -> remainingSells)), 
        OrderExecuted(currentTime(), 
          Execution(buy.id, sell.price), Execution(sell.id, sell.price))) 

The ListOrderBook takes advantage of the List pattern matching to handle the two possible cross scenarios:

  • The executed sell order is the only order available in the price level
  • Additional sell orders remain at the price level

In the former scenario, the price level is removed from the book by removing the key from the offers TreeMap. In the latter scenario, the remaining orders form the new price level. Clearly, the order book is optimized for executions over adding resting orders. You wonder why this bias exists in the order book implementation. You wonder to yourself, "perhaps, executions are more much more prevalent than resting orders?" You are unsure and make a mental note to chat with Dave.

Note

Pause for a moment to consider biases in systems that you have designed. Did you optimize operations proportional to usage or latency constraints? Looking back, did your design choices lead you towards the best possible performance for the most important operations? Of course, hindsight makes it easy to call out suboptimal design choices. By reflecting on how you made these choices, you might be better able to avoid similar deficiencies in future systems.

Canceling orders

The ListOrderBook also supports the CancelOrder command to remove an existing order by ID. Cancel requests pose an algorithmic challenge to ListOrderBook. As only the order ID is provided, ListOrderBook cannot efficiently determine which side the order rests on (that is, buy or sell). To determine the side, the buy and sell price levels are swept to find the order ID. This is an operation that is proportional to the number of price levels per side and the length of each price level. The worst case scenario is submitting an order ID that does not exist in the order book. The entire book must be swept to identify the absence of the provided order ID. A malicious trader could slow down MVT order book operations by submitting a constant stream of nonexistent order IDs. You make a note to talk with Dave about malicious trading activities and what MVT can do to defend against them.

Assuming that the order referenced by the cancel request exists in the book and its price level is discovered, the act of removing the cancelled order from the book is also expensive. Canceling is a linear time operation that requires traversing the linked list of orders and removing the node with the matching order ID. The following snippet implements canceling a sell order in ListOrderBook:

orderBook.offers.find { case (price, priceLevel) => priceLevel.exists(_.id == idToCancel) } 
        .fold[(ListOrderBook, Event)](orderBook -> 
        OrderCancelRejected(currentTime(), idToCancel)) { 
        case (price, priceLevel) => 
          val updatedPriceLevel = priceLevel.filter(_.id != idToCancel) 
          orderBook.copy(offers = updatedPriceLevel.nonEmpty match { 
            case true => orderBook.offers + (price -> updatedPriceLevel) 
            case false => orderBook.offers - price 
          }) -> OrderCanceled(currentTime(), idToCancel) 

Studying this snippet, it is unsurprising to you that cancelation performance is the least performant order book operation. There are two linear time passes performed per price level to cancel the order. First, exists traverses the list of price level orders to determine whether the ID to be canceled exists in the price level. Once the price level containing the ID is found, there is a second traversal via filter to update the state of the order book.

The cancelation implementation in ListOrderBook is an illustration of the double-edged sword of Scala's expressive collection API. By virtue of being expressive, the cancelation logic is simple to understand and to maintain. However, its expressiveness also makes it easy to hide that the runtime performance of removing an order from a price level is 2 * N, where N is the number of orders in a price level. This simple example makes it clear that in a performance-sensitive environment, it is important to take a step back from the code to consider the runtime overhead of the data structure that is being used.

The current order book – queue implementation

You refrain from judging ListOrderBook too harshly because you know from your prior software development experiences that there were likely extenuating circumstances that led to this implementation. You turn your attention to the current order book implementation, which is in QueueOrderBook. Looking over the source code, you are surprised to discover the implementation appears to match ListOrderBook except for the price level data structure:

case class QueueOrderBook( 
  bids: TreeMap[Price, Queue[BuyLimitOrder]], 
  offers: TreeMap[Price, Queue[SellLimitOrder]]) 

The only difference between the two implementations is the use of scala.collection.immutable.Queue in place of List to represent a price level. From a modeling perspective, using a FIFO queue makes sense. As time priority dictates execution order, a FIFO queue is a natural fit to store resting orders. You begin wondering whether switching out List for Queue was done purely for modeling purposes. The question on your mind is, "how does replacing List with Queue improve order book performance?" Understanding this change requires digging deeper into Scala's Queue implementation.

Queue

This snippet of a Queue definition reveals an interesting insight:

class Queue[+A] protected(protected val in: List[A], protected val out: List[A]) 

Without reading deeply into the Queue implementation, we see that it uses two Lists to manage state. Given the usage of List to model a FIFO queue in ListOrderBook, it should not be surprising to see the usage of List to build an immutable FIFO queue data structure. Let's look at the enqueue and dequeue operations to understand how in and out impact Queue performance. The following snippet shows the implementation of enqueue:

def enqueue[B >: A](elem: B) = new Queue(elem :: in, out) 

As the element is prepended to in, enqueueing is a constant time operation. Recall that the analogous ListOrderBook operation is adding a resting order, which has linear runtime performance. This is a clear performance win for QueueOrderBook. Next, we consider dequeue implementation:

def dequeue: (A, Queue[A]) = out match { 
    case Nil if !in.isEmpty => val rev = in.reverse ; (rev.head, new Queue(Nil, rev.tail)) 
    case x :: xs            => (x, new Queue(in, xs)) 
    case _                  => throw new NoSuchElementException("dequeue on empty queue") 
  } 

Note

As the implementation shows, dequeue throws an exception when invoked with an empty Queue. The exception is an unexpected outcome to invoking dequeue and feels out of place in the functional programming paradigm. For this reason, Queue also provides dequeueOption that returns an Option. This makes the handling of an empty Queue explicit and easier to reason about. We recommend using dequeueOption in any situation where you cannot guarantee that dequeue will always be called on a nonempty Queue.

The dequeue operation is more involved than enqueue due to the interaction between in and out. To understand how the Queue state is managed with the dequeue operations, review the following table. This table walks through a series of the enqueue and dequeue operations, listing the state of in and out at each step. As you review the table, consider which  dequeue patterns match statements that are invoked:

Operation

In

Out

enqueue(1)

List(1)

Nil

enqueue(2)

List(1, 2)

Nil

enqueue(3)

List(1, 2, 3)

Nil

dequeue

Nil

List(2, 3)

dequeue

Nil

List(3)

enqueue(4)

List(4)

List(3)

dequeue

List(4)

Nil

dequeue

Nil

Nil

As the enqueue and dequeue invocations are intermingled, both in and out retain state. In the final sequence displayed, the queue returns to its initial state (that is, both in and out empty). The key insight from this implementation is that Queue amortizes the cost of dequeue to be constant time by deferring transfers from in and out. Each element transfer from in and out is a linear time reverse operation to maintain first-in, first-out ordering. Deferring the cost of this expensive operation until out is empty is a form of lazy evaluation. This is an illustrative example of how lazy evaluation can be used to improve runtime performance.

Now that you have an understanding of how Queue is implemented, you can reason about the performance improvements delivered by QueueOrderBook. The following table itemizes the runtime performance of each scenario to modify a price level:

Scenario

ListOrderBook

QueueOrderBook

Add resting limit order

Linear

Constant

Generate execution

Constant

Amortized constant

Cancel order

Linear

Linear

This table illustrates how understanding the runtime characteristics of the Scala collection API can result in tangible performance wins with small changes to your implementation. Recall that when QueueOrderBook was introduced, it was noted that its implementation is identical to ListOrderBook, the module changes to replace List operations with analogous Queue operations. This is a comparatively simple change for the performance boost shown previously.

You are excited to see the performance win to handle limit orders with QueueOrderBook, but you are left wondering about what can be done about cancelation performance. It remains unsettling to you that QueueOrderBook retains the same cancelation performance. In particular, because of the recent market volatility that exposed order book cancelation performance's weakness that caused MVT to trade unprofitably. Lazy evaluation was a big performance win to handle limit orders. Can this principle also be applied to cancel requests?

Improved cancellation performance through lazy evaluation

Queue provides high-performance enqueue and dequeue operations using the additional state, the second List, to defer and to batch expensive operations. This principle can be applied to the order book. When canceling an order, there are two expensive operations:

  • Identifying the price level containing the order-to-be-canceled
  • Traversing a Queue or List to remove the canceled order

Focusing on the second operation, the motivating question is, "how can the order book defer the cost of linear traversal to modify internal state?" To answer this question, it is often helpful to consider the strengths of your implementation. With either order book implementation, we know there is excellent execution performance. One strategy that takes advantage of this insight is to defer cancellation until order execution occurs. The approach is to use additional state to maintain the intent to cancel without removing the order from order book state until it is performant to do so. This approach could look like the following:

case class LazyCancelOrderBook( 
  pendingCancelIds: Set[OrderId], 
  bids: TreeMap[Price, Queue[BuyLimitOrder]], 
  offers: TreeMap[Price, Queue[SellLimitOrder]]) 

The LazyCancelOrderBook class adds additional state in the form of a scala.collection.immutable.Set to manage the IDs of canceled requests that have not been reflected into the the state of bids and offers. Before diving into how pendingCancelIds is used, let's investigate the Scala implementation of Set.

Set

Scala's implementation of Set is neither an ADT, such as List, nor a concrete implementation, such as TreeMap. Instead, it is a trait, as shown in this snippet of its definition:

trait Set[A] 

The reason the standard library defines it is as a trait is to support specific implementations depending upon the element count. The Set companion object defines five implementations for sizes zero to four. Each implementation contains a fixed number of elements, as shown in Set3, as follows:

class Set3[A] private[collection] (elem1: A, elem2: A, elem3: A) 

When the number of elements is small, the runtime performance is faster with hand-rolled Set implementations. With this technique, additions and removals point to the next or previous hand-rolled implementation. For example, consider + and - from Set3:

    def + (elem: A): Set[A] = 
      if (contains(elem)) this 
      else new Set4(elem1, elem2, elem3, elem) 
 
    def - (elem: A): Set[A] = 
      if (elem == elem1) new Set2(elem2, elem3) 
      else if (elem == elem2) new Set2(elem1, elem3) 
      else if (elem == elem3) new Set2(elem1, elem2) 
      else this 

After Set4, the standard library uses an implementation named HashSet. This is visible when adding an element to Set4:

 def + (elem: A): Set[A] = 
      if (contains(elem)) this 
      else new HashSet[A] + (elem1, elem2, elem3, elem4, elem) 

The HashSet is analogous to TreeMap because it is backed by an efficient data structure to manage internal state. For HashSet, the backing data structure is a hash trie. The hash trie provides amortized constant time performance for additions, removals, and contains operations as per the Scala collections performance overview (http://docs.scala-lang.org/overviews/collections/performance-characteristics.html). If you want to dig deeper into how a hash trie works, the Scala hash trie overview (http://docs.scala-lang.org/overviews/collections/concrete-immutable-collection-classes.html#hash-tries) is a good starting point.

Returning to the LazyCancelOrderBook, we now know that common set operations with pendingCancelIds are completed in amortized constant time. Provided that we focus on additions and removals, and contains operations, this suggests there will be minimal overhead as the size of the set increases. We can use pendingCancelIds to represent the intent to remove an order from the order book without paying the cost of performing the removal. This simplifies the handling of a cancel order to be a constant time addition to pendingCancelIds:

def handleCancelOrder( 
    currentTime: () => EventInstant, 
    ob: LazyCancelOrderBook, 
    id: OrderId): (LazyCancelOrderBook, Event) = 
    ob.copy(pendingCancelIds = ob.pendingCancelIds + id) -> 
      OrderCanceled(currentTime(), id) 

The implementation of handleCancelOrder becomes trivial because the work to remove the order from the book is deferred. While this is a performance win, this implementation suffers from a serious deficiency. This implementation is no longer able to identify order IDs that are absent from the order book, which result in OrderCancelRejected. One way to account for this requirement is to maintain an additional Set containing order IDs actively resting on the book. Now, the LazyCancelOrderBook state looks like the following:

case class LazyCancelOrderBook( 
  activeIds: Set[OrderId], 
  pendingCancelIds: Set[OrderId], 
  bids: TreeMap[Price, Queue[BuyLimitOrder]], 
  offers: TreeMap[Price, Queue[SellLimitOrder]]) 

With this definition, we can rewrite handleCancelOrder to account for nonexistent order IDs:

def handleCancelOrder( 
    currentTime: () => EventInstant, 
    ob: LazyCancelOrderBook, 
    id: OrderId): (LazyCancelOrderBook, Event) = 
    ob.activeIds.contains(id) match { 
      case true => ob.copy(activeIds = ob.activeIds - id, 
        pendingCancelIds = ob.pendingCancelIds + id) -> 
        OrderCanceled(currentTime(), id) 
      case false => ob -> OrderCancelRejected(currentTime(), id) 
    } 

This implementation involves three amortized, constant time operations when the order ID exists in the book. First, there is an operation to identify whether or not the order ID exists in the order book. Then, the provided order ID is removed from the active ID set and added to the pending cancel set. Previously, this scenario required two linear runtime operations. The degenerate scenario of handling a nonexistent order ID now shrinks to a single amortized constant time operation.

Before celebrating performance wins, bear in mind that we still need to remove canceled orders from the book. To reduce the cost of cancelations, two potentially large sets were added to the order book, which increases the size of the memory footprint and garbage collection pressure. Additionally, benchmarking is needed to prove that theoretical performance improvements translate to real-world performance.

To complete LazyCancelOrderBook implementation, we need to account for activeIds when handling a limit order and pendingCancelIds when generating an execution. As you may recall, handling a limit order involved two scenarios:

  • Adding a resting limit order
  • Crossing the book to generate an execution

Here is a partially implemented snippet that prepares us to handle these two scenarios for a BuyLimitOrder:

orderBook.bestOffer.exists(_.price.value <= buy.price.value) match { 
        case true => ??? // crossing order 
        case false => ???  // resting order 

To support resting buy orders, the provided buy order must be enqueued and additionally, the buy order ID must be added to the activeOrderIds set:

      def restLimitOrder: (LazyCancelOrderBook, Event) = { 
        val orders = orderBook.bids.getOrElse(buy.price, Queue.empty) 
        orderBook.copy(bids = orderBook.bids + (buy.price -> orders.enqueue(buy)), 
          activeIds = orderBook.activeIds + buy.id) -> LimitOrderAdded(currentTime()) 
      } 
 
orderBook.bestOffer.exists(_.price.value <= buy.price.value) match { 
        case true => ??? // crossing order 
        case false => restLimitOrder 

The logic to add a resting limit order is shown in the preceding code and extracted into a method named restLimitOrder. This logic resembles the analogous scenario for ListOrderBook with the added amortized constant time active order ID addition operation. This change is straightforward and adds little processing time overhead. Finally, we consider the more complicated order crossing scenario. This scenario is analogous to Queue.dequeue in that this implementation pays the cost of the deferred action. The first dilemma to solve is identifying which order can be executed and which orders must be removed because they are canceled. findActiveOrder supplies this functionality and is shown with the assumption that orderBook is lexically in scope, as follows:

      @tailrec 
      def findActiveOrder( 
        q: Queue[SellLimitOrder], 
        idsToRemove: Set[OrderId]): (Option[SellLimitOrder], Option[Queue[SellLimitOrder]], Set[OrderId]) = 
        q.dequeueOption match { 
          case Some((o, qq)) => orderBook.pendingCancelIds.contains(o.id) match { 
            case true => 
              findActiveOrder(qq, idsToRemove + o.id) 
            case false => 
              (Some(o), if (qq.nonEmpty) Some(qq) else None, idsToRemove + o.id) 
          } 
          case None => (None, None, idsToRemove) 
        } 

findActiveOrder recursively inspects a sell price level until an executable order is found or the price level is empty. In addition to optionally resolving a sell order that can be executed, the method returns the remaining price level. These order IDs have been canceled and must be removed from pendingCancelIds. Here, we see the bulk of the canceled work deferred when the cancel request was handled. Execution is now amortized to be a constant time operation when executions occur repeatedly without a cancelation in-between. The worst case scenario is a linear runtime that is proportional to the number of canceled orders in the price level. Let's look at how findActiveOrder is used to update the state of the order book:

orderBook.offers.headOption.fold(restLimitOrder) { 
        case (price, offers) => findActiveOrder(offers, Set.empty) match { 
          case (Some(o), Some(qq), rms) => (orderBook.copy( 
            offers = orderBook.offers + (o.price -> qq), activeIds = orderBook.activeIds -- rms), 
            OrderExecuted(currentTime(), 
              Execution(buy.id, o.price), Execution(o.id, o.price))) 
          case (Some(o), None, rms) => (orderBook.copy( 
            offers = orderBook.offers - o.price, activeIds = orderBook.activeIds -- rms), 
            OrderExecuted(currentTime(), 
              Execution(buy.id, o.price), Execution(o.id, o.price))) 
          case (None, _, rms) => 
            val bs = orderBook.bids.getOrElse(buy.price, Queue.empty).enqueue(buy) 
            (orderBook.copy(bids = orderBook.bids + (buy.price -> bs), 
              offers = orderBook.offers - price, 
              activeIds = orderBook.activeIds -- rms + buy.id), 
              LimitOrderAdded(currentTime())) 
        } 
      } 

Order crossing implementation is now arguably more complicated than in ListOrderBook or QueueOrderBook due to the work to remove canceled orders and to remove the removed order IDs from pendingCancelIds. In all three pattern match statements, the set of returned order IDs returned as the final tuple member is removed from pendingCancelIds to indicate that the order is now removed from the book. The first two pattern match statements handle the distinction between finding an active order with one or more remaining orders in the price level and finding an active order with zero remaining orders in the price level. In the latter scenario, the price level is removed from the book. The third pattern match statement accounts for the scenario where an active order is not found. If an active order is not found because all orders were pending cancelation, then, by definition, the entire price level was searched, and it is, therefore, now empty.

Benchmarking LazyCancelOrderBook

As a rigorous performance engineer, you realize that although your code compiles and your tests pass, your work is not yet complete. You begin pondering how to benchmark LazyCancelOrderBook to determine whether or not your changes have improved real-world performance. Your first idea is to test cancelation in isolation to confirm that this operation has indeed been optimized. To do this, you rework CancelBenchmarks, which was introduced in Chapter 2, Measuring Performance on the JVM, to work with QueueOrderBook and LazyCancelOrderBook. This benchmark sweeps different price level sizes canceling the first order, the last order, and a nonexistent order. We omit the source code because it is identical to the previous implementation and instead consider the results. These results were produced by running the following:

sbt 'project chapter4' 'jmh:run CancelBenchmarks -foe true'

The benchmark provides us with the following results:

Benchmark

Enqueued order count

Throughput (ops per second)

Error as percentage of throughput

eagerCancelFirstOrderInLine

1

6,912,696.09

± 0.44

lazyCancelFirstOrderInLine

1

25,676,031.5

± 0.22

eagerCancelFirstOrderInLine

10

2,332,046.09

± 0.96

lazyCancelFirstOrderInLine

10

12,656,750.43

± 0.31

eagerCancelFirstOrderInLine

1

5,641,784.63

± 0.49

lazyCancelFirstOrderInLine

1

25,619,665.34

± 0.48

eagerCancelFirstOrderInLine

10

1,788,885.62

± 0.39

lazyCancelFirstOrderInLine

10

13,269,215.32

± 0.30

eagerCancelFirstOrderInLine

1

9,351,630.96

± 0.19

lazyCancelFirstOrderInLine

1

31,742,147.67

± 0.65

eagerCancelFirstOrderInLine

10

6,897,164.11

± 0.25

lazyCancelFirstOrderInLine

10

24,102,925.78

± 0.24

This test demonstrates that LazyCancelOrderBook consistently outperforms QueueOrderBook when canceling the first order, the last order, and a nonexistent order across order queue sizes of one and ten. This is exactly as expected because LazyCancelOrderBook defers the most expensive work until an order is executed. We see constant performance independent of the position of the order-to-be-canceled, which is further proof that the removal work is deferred. Also as expected, we see that canceling a nonexistent order results in improved performance because a linear traversal is no longer required to ascertain the absence of an order. However, we notice the performance hit as the enqueued order count increases from one to ten for LazyCancelOrderBook. We can hypothesize that the nearly 50% throughput reduction is due to the overhead of managing the state of active and pending cancel order IDs.

This result is a promising sign that your changes are indeed improving the real-world performance. As the new implementation passed the initial litmus test, you think about how to representatively simulate a combination of executions and cancelations. You decide to focus on creating a microbenchmark that combines executions and cancelations to exercise LazyCancelOrderBook in scenarios that more closely resemble production. You think back to a recent lunch conversation you had with Dave about market trading flows and recall that he said it is common to see about two cancelations per execution. Running with this idea, you create a benchmark that interleaves trades and cancelations. For both order book implementations, you want to test performance when during the following scenarios:

  • Two trades per cancelation
  • One trade per cancelation
  • Two cancelations per trade

These three scenarios will help reveal shortcomings in LazyCancelOrderBook by focusing on production-like order book activities. The benchmark requires initializing each order book with a set of resting orders to be canceled or executed against. The following snippet demonstrates how to initialize the order books in a JMH test:

@State(Scope.Benchmark) 
  class InterleavedOrderState { 
    var lazyBook: LazyCancelOrderBook = LazyCancelOrderBook.empty 
    var eagerBook: QueueOrderBook = QueueOrderBook.empty 
 
    @Setup 
    def setup(): Unit = { 
      lazyBook = (1 to maxOrderCount).foldLeft(LazyCancelOrderBook.empty) { 
        case (b, i) => LazyCancelOrderBook.handle( 
          () => EventInstant.now(), b, AddLimitOrder( 
            CommandInstant.now(), BuyLimitOrder(OrderId(i), bidPrice)))._1 
      } 
      eagerBook = (1 to maxOrderCount).foldLeft(QueueOrderBook.empty) { 
        case (b, i) => QueueOrderBook.handle( 
          () => EventInstant.now(), b, AddLimitOrder( 
            CommandInstant.now(), BuyLimitOrder(OrderId(i), bidPrice)))._1 
      } 
    } 
  } 

Before each trial, both order books will be filled with maxOrderCount (defined to be 30) resting bids. As there are three scenarios to test and two order books, there are six benchmarks defined for this test. Each set of three scenarios is the same per order book implementation. To avoid duplication, the following snippet shows the three benchmarks implemented for LazyCancelOrderBook:

@Benchmark 
  def lazyOneToOneCT(state: InterleavedOrderState): LazyCancelOrderBook = { 
    val b1 = LazyCancelOrderBook.handle(() => EventInstant.now(), 
      state.lazyBook, firstCancel)._1 
    LazyCancelOrderBook.handle(() => EventInstant.now(), 
      b1, firstCrossSell)._1 
  } 
 
@Benchmark 
  def lazyTwoToOneCT(state: InterleavedOrderState): LazyCancelOrderBook = { 
    val b1 = LazyCancelOrderBook.handle(() => EventInstant.now(), 
      state.lazyBook, firstCancel)._1 
    val b2 = LazyCancelOrderBook.handle(() => EventInstant.now(), 
      b1, secondCancel)._1 
    LazyCancelOrderBook.handle(() => EventInstant.now(), 
      b2, firstCrossSell)._1 
  } 
 
@Benchmark 
  def lazyOneToTwoCT(state: InterleavedOrderState): LazyCancelOrderBook = { 
    val b1 = LazyCancelOrderBook.handle(() => EventInstant.now(), 
      state.lazyBook, firstCancel)._1 
    val b2 = LazyCancelOrderBook.handle(() => EventInstant.now(), 
      b1, firstCrossSell)._1 
    LazyCancelOrderBook.handle(() => EventInstant.now(), 
      b2, secondCrossSell)._1 
  } 

These benchmarks follow the convention of denoting the cancelation frequency ("C") first and the trade frequency ("T") second. For example, the final benchmark implements the scenario that represents one cancelation for every two trades. The commands are defined as values out-of-scope to avoid generating garbage during benchmark invocation. The benchmark invocation looks like the following:

sbt 'project chapter4' 'jmh:run InterleavedOrderBenchmarks -foe true'

This invocation produces the following results:

Benchmark

Throughput (ops per second)

Error as percentage of throughput

eagerOneToTwoCT

797,339.08

± 2.63

lazyOneToTwoCT

1,123,157.94

± 1.26

eagerOneToOneCT

854,635.26

± 2.48

lazyOneToOneCT

1,469,338.46

± 1.85

eagerTwoToOneCT

497,368.11

± 0.72

lazyTwoToOneCT

1,208,671.60

± 1.69

Across the board, LazyCancelOrderBook outperforms QueueOrderBook. The relative difference between lazy and eager performance shows an interesting relationship. The following table captures the relative performance difference:

Benchmark

LazyCancelOrderBook percentage performance improvement

OneToTwoCT

141.00%

OneToOneCT

172.00%

TwoToOneCT

243.00%

Studying the preceding table, we observe that LazyCancelOrderBook shows the greatest performance win when there are two cancelations per trade. This result demonstrates the benefit of deferring the cost of processing a cancelation request. The next trend that we see is that as the frequency of trades increases and the frequency of cancelations decreases, QueueOrderBook performance improves relative to LazyCancelOrderBook. This result makes sense because LazyCancelOrderBook incurs extra costs when performing a trade. In addition to searching for canceled orders, LazyCancelOrderBook must update activeIds. The QueueOrderBook avoids these costs, but we see the overwhelming cost of cancelation processing continues to overshadow QueueOrderBook performance. Summarizing these results, we have more confidence that LazyCancelOrderBook is a stand-in replacement for QueueOrderBook. In scenarios involving heavy volumes of cancelations, it appears to be a clear winner, and in other scenarios, it appears to maintain parity with QueueOrderBook.

Lessons learned

In this section, we leveraged Scala collections, in conjunction with the judicious use of lazy evaluation, to improve the performance of a critical component in MVT's infrastructure. By working through several order book implementations, you learned first-hand how a well-suited data structure can improve performance while a less optimal choice can derail performance. This exercise also exposed you to how Scala implements several of its collections, which you can now use to your advantage when working on a performance problem.

LazyCancelOrderBook illustrates how valuable deferred evaluation can be in a performance-sensitive environment. When faced with a performance challenge, ask yourself the following questions to see whether it is possible to defer work (CPU work, not your actual work!). The following table lists each question and how it was answered with the order book:

Question

Application to order book example

How can I decompose into smaller discrete chunks?

The act of canceling was decomposed into identifying the event that was sent to the requester and removing the canceled order from the book state.

Why am I performing all of these steps now?

Originally, order removal happened eagerly because it was the most logical way to model the process.

Can I change any constraints to allow me to model the problem differently?

Ideally, we would have liked to remove the constraint requiring rejection of nonexistent orders. Unfortunately, this was out of our control.

What operations in my system are most performant?

Executing an order and resting an order on the book are the most performant operations. We leveraged fast execution time to perform removals of canceled orders from the book.

Like any approach, deferred evaluation is not a panacea. Diligent benchmarking and profiling are necessary to validate the benefit delivered by the change. Arguably the implementation of LazyCancelOrderBook is more complicated than QueueOrderBook, which will increase the cost to maintain the system. In addition to making implementation more complicated, it is now more difficult to reason about runtime performance due to the variable cost of order execution. For the scenarios that we tested, LazyCancelOrderBook remained at parity with or better than QueueOrderBook. However, we only exercised a few of the many possible scenarios, and we did so with only a single price level in the order book. In a real-world environment, additional benchmarking and profiling are needed to build enough confidence that this new implementation delivers better performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset