In this chapter, we return to MVT in order to take on challenges that span multiple MVT teams. The market data team requires improved critical path order book performance to handle increased cancel request volume. The data science team wants better ad hoc data analysis tools to research trading strategies. Everyone has a problem that had to be solved yesterday. That's the start-up lifestyle!
We use the functional paradigm, our existing knowledge, and the Scala collections API to our advantage to solve these challenges. The power of the Scala language and its collections API allow you to approach problems in ways that you may not have thought possible before. As we work through these challenges and encounter new Scala collection usage, we detail collection implementation and tradeoffs to consider. We will consider the following collections in this chapter:
In Chapter 1, The Road to Performance, you met MVT's head trader, Dave, under tense circumstances. The financial markets underwent a period of extreme volatility that exposed a weakness in the order book design. After speaking to Dave, you learned that in volatile markets, order volume is dominated by cancels because traders are reacting to quickly changing market conditions. Through order book benchmarking and profiling, you confirmed the suspicion that under high volume, cancel performance causes high order book response latency.
Although the market volatility that caused trading losses has passed, Dave recognizes the risk that future volatility poses for MVT's returns. Dave wants to invest engineering effort into making the order book more performant when cancelations frequently occur. By working with the data science team, Dave analyzed historical order book activity over a three month period and discovered interesting market characteristics. He shares with you that in the three months analyzed, on a per trading day basis, cancels comprised, on average, 70% of order book commands. The analysis also revealed that on the most volatile market days, cancel activity represents about 85% of order book activity. Known for his puns, Dave concludes with, "Now, you know everything I know. Like the order book, we are counting on you to execute!"
Excited to improve order book performance, your first step is to familiarize yourself with the order book implementation. As you open up the order book repository, you ping Gary, a fellow engineer who has prior order book development experience. As Gary knows the history of order book development, he tells you to check out ListOrderBook
. "This was our first attempt at modeling the order book. I think you can learn from our design by seeing its first incarnation," he adds, "Once you understand the implementation, check out QueueOrderBook
. That's the next version of the order book. You profiled an older iteration of this implementation when we had the volatility wave. Let me know if you have any questions!" After thanking him, you dig into the repository to find ListOrderBook
.
The ListOrderBook
class defines the following state to manage buys (bids) and sells (offers):
case class ListOrderBook( bids: TreeMap[Price, List[BuyLimitOrder]], offers: TreeMap[Price, List[SellLimitOrder]]) { def bestBid: Option[BuyLimitOrder] = ??? // hidden for brevity def bestOffer: Option[SellLimitOrder] = ??? // hidden for brevity }
To refresh our memory, here are definitions of Price
, BuyLimitOrder
, and SellLimitOrder
:
sealed trait LimitOrder { def id: OrderId def price: Price } case class BuyLimitOrder(id: OrderId, price: Price) extends LimitOrder case class SellLimitOrder(id: OrderId, price: Price) extends LimitOrder case class Price(value: BigDecimal)
The LimitOrder
is an algebraic data type (ADT) that represents the two possible order sides. The Price
class is a strongly-typed wrapper for BigDecimal
. Recalling the performance boost that value classes provide, you modify the definition of Price
, as follows:
case class Price(value: BigDecimal) extends AnyVal
The ListOrderBook
class uses two Scala collection types to maintain its state: List
and TreeMap
. Let's have a deeper look at these data structures to understand the tradeoffs that they present.
Scala implements List
as an immutable singly-linked list. A List
is an ordered collection of elements of the same type. A List
is a sealed abstract class with two implementations: Nil
, which represents the empty list, and ::
(often called cons), which is used to represent an element and a tail. To make things more concrete, let's look at some pseudocode, which is close to the actual implementation:
sealed trait List[+A] case object Nil extends List[Nothing] case class ::[A](head: A, tail: List[A]) extends List[A]
A List
of three integers can be constructed using the following notation:
val list = ::(1, ::(2, ::(3, Nil)))
Note the plus sign in the definition of the List
trait. The plus (+
) sign indicates that List
is covariant on its type parameter, A
. Covariance allows you to express polymorphic constraints with generic types. To make this more concrete, consider the following definitions:
sealed trait Base
case class Impl(value: Int) extends Base
Here, a relationship is expressed between Base
and Impl
. The Impl
class is a subtype of Base
. When used with List
, covariance allows us to express that List[Impl]
is a subtype of List[Base]
. Expressed with an example, covariance is what allows the following snippet to compile:
val bases: List[Base] = List[Impl](Impl(1))
Covariance belongs to the broader topic of variances. If you wish to learn more about variances in Scala, refer to this excellent blog post by Andreas Schroeder at https://blog.codecentric.de/en/2015/03/scala-type-system-parameterized-types-variances-part-1/.
Unlike most other Scala collections, List
supports pattern matching on its content. This is a powerful way to write expressive code that handles multiple scenarios while retaining compile-time safety that all possible cases are handled. Consider the following snippet:
List(1,2,3,4) match { case 1 :: x :: rest => println(s"second element: $x, rest: $rest") }
In this simple pattern match, we are able to express several concerns. Here, 1
is 1
, x
is 2
, and rest
is List(3,4)
. When compiled, this snippet elicits a compiler warning because the Scala compiler infers that there are possible List
patterns that were unmatched (for example, empty List
). Compiler-provided warnings minimize the chance of your forgetting to handle a valid input.
A List
is optimized for prepend operations. Adding 0 to the previous list is as easy as doing this:
val list = ::(1, ::(2, ::(3, Nil))) val listWithZero = ::(0, list)
This is a constant-time operation, and it has almost no memory cost, as List
implements data sharing. In other words, the new list, listWithZero
, is not a deep copy of list
. Instead, it re-uses all its allocated elements and allocates only one new element, the cell containing 0
:
In contrast to prepend operations, append operations (that is, adding an element to the end of the list) are computationally expensive because the entire List
must be copied:
Given the poor append performance of List, you may wonder whether it is safe to use a map
transform. A map
transform occurs by applying a function to successive elements in the List
, which can be logically represented by appending transformed values to a new List
. To avoid this performance pitfall, List.map
overrides the default implementation provided by the trait TraversableOnce
to apply the transform using prepend operations. This provides improved List.map
performance while retaining the same API. Overriding default behavior to provide a specialized implementation is a common Scala collections pattern. Constant time head operations make List
ideal for algorithms involving last-in, first-out (LIFO) operations. For random access and first-in, first-out (FIFO) behaviors, you should employ List
selectively.
In the next section, we investigate TreeMap
. The TreeMap
class is the implementation of the SortedMap
trait that is used to maintain bids and offers.
The TreeMap
class is a map that orders keys according to a provided ordering strategy. The following snippet of its class definition makes the ordering requirement clear:
class TreeMap[A, +B] private (tree: RB.Tree[A, B])(implicit val ordering: Ordering[A])
The Ordering
class is a type class that defines a contract for the natural ordering of elements of the A
type.
If type classes are a concept that is new to you, we encourage you to read Daniel Westheide's well-written blog post on the topic at http://danielwestheide.com/blog/2013/02/06/the-neophytes-guide-to-scala-part-12-type-classes.html.
In ListOrderBook
, we see that Price
is the key. Looking at the companion object of Price
, we see that the ordering is defined by delegating to the underlying BigDecimal
type's ordering definition:
object Price { implicit val ordering: Ordering[Price] = new Ordering[Price] { def compare(x: Price, y: Price): Int = Ordering.BigDecimal.compare(x.value, y.value) } }
The TreeMap
class referenced by ListOrderBook
, like List
, is immutable. Immutability provides strong reasoning guarantees. We can be certain that there are no side effects because the effect of adding or removing a value from the map is always reflected as a new map.
The TreeMap
class implementation is a special type of binary search tree, the red-black tree. This tree implementation provides logarithmic operation time for lookups, additions, and removals. You might be surprised to see TreeMap
in place of HashMap
. As documented in the Scala collections performance overview (http://docs.scala-lang.org/overviews/collections/performance-characteristics.html), HashMap
provides constant time lookups, additions, and removals, which is faster than TreeMap
. However, TreeMap
offers superior performance when performing ordered traversals. For example, finding the largest key in the map can be done in logarithmic time with TreeMap
, while this is done in linear time for HashMap
. This difference is an indicator that the order book implementation requires efficient ordered Price
traversals.
Coming back to the ListOrderBook
implementation, we see the following partial method definition reflects the heart of the order book:
def handle( currentTime: () => EventInstant, ob: ListOrderBook, c: Command): (ListOrderBook, Event) = c match { case AddLimitOrder(_, o) => ??? // hidden for brevity case CancelOrder(_, id) => ??? // hidden for brevity }
It might seem curious that a function is supplied as an argument to retrieve the current time. A potentially simpler way to achieve the same effect is to invoke System.currentTimeMillis()
. The shortcoming of this approach is that accessing the system clock is a side-effect, which means that the function is no longer referentially transparent. By providing a function to retrieve the current time, we are able to control how this side-effect happens and produce repeatable test cases.
Given a Command
, an order book instance, and a way to obtain the current time for event timestamps, an Event
and a new state are produced. To refresh our memory, here are the commands the order book can process:
sealed trait Command case class AddLimitOrder(i: CommandInstant, o: LimitOrder) extends Command case class CancelOrder(i: CommandInstant, id: OrderId) extends Command
The following are the possible events created by processing commands:
sealed trait Event case class OrderExecuted(i: EventInstant, buy: Execution, sell: Execution) extends Event case class LimitOrderAdded(i: EventInstant) extends Event case class OrderCancelRejected(i: EventInstant, id: OrderId) extends Event case class OrderCanceled(i: EventInstant, id: OrderId) extends Event
Let's focus on supporting the AddLimitOrder
command to better understand the algorithmic properties of historical design choices. When adding a limit order, one of two outcomes is possible:
OrderExecuted
LimitOrderAdded
Deducing whether or not the order crosses the book requires looking at the best price on the opposing side. Returning to the definition of LimitOrderBook
with complete implementation of bestBid
and bestOffer
, we see the following:
case class ListOrderBook( bids: TreeMap[Price, List[BuyLimitOrder]], offers: TreeMap[Price, List[SellLimitOrder]]) { def bestBid: Option[BuyLimitOrder] = bids.lastOption.flatMap(_._2.headOption) def bestOffer: Option[SellLimitOrder] = offers.headOption.flatMap(_._2.headOption) }
The implementation shows that we are taking advantage of the logarithmic ordered search property of TreeMap
. The best bid is the key with the highest price, which is the last value in the tree because the ordering is ascending. The best offer is the key with the lowest price, which is the first value in the tree.
Focusing specifically on the addition of a buy limit order and given the best offer, the following comparison occurs to determine whether the incoming buy order crosses the book or rests on the book:
orderBook.bestOffer.exists(buyOrder.price.value >= _.price.value) match { case true => ??? // cross the book case false => ??? // rest on the book }
Let's first assume that the incoming buy order's price is lower than the best offer, which means the order is added to the book (that is, rests on the book). The question we are trying to answer is, "where in the book should the order be added?" The order book performs a logarithmic search to find the price level associated with the order price. From the definition of ListOrderBook
, you know that each value in the map (the price level) is represented as a List
of orders. Recalling a discussion with the head trader, Dave, you remember that orders within a price level are executed based on time priority. The first order added to a price level is the first order to be executed. Conceptually, a price level is a first-in, first-out (FIFO) queue. The implication is that adding an order to a price level is a linear time operation because the order is appended to the end. The following snippet confirms your hypothesis:
val orders = orderBook.bids.getOrElse(buyOrder.price, Nil) orderBook.copy(bids = orderBook.bids + (buyOrder.price -> orders.:+(buyOrder))) -> LimitOrderAdded(currentTime())
The snippet shows that adding a resting limit order to the book involves a linear time append operation to List
of BuyLimitOrder
. In your mind, you are beginning to wonder how MVT was able to trade profitably at all with this order book. Before leaping to this harsh judgment, you consider how crossing the book is handled.
Assuming that the incoming buy order's price is greater than or equal to the best offer price, then the buy order crosses the book, causing an execution. Time priority dictates that the first sell order received is executed against the incoming buy order, which translates to taking the first sell order in the price level. When generating an execution, you realize that modeling a price level with a List
provides constant time performance. The following snippet shows how a price level is modified on a buy execution:
case (priceLevel, (sell :: Nil)) => (orderBook.copy(offers = orderBook.offers - sell.price), OrderExecuted(currentTime(), Execution(buy.id, sell.price), Execution(sell.id, sell.price))) case (_, (sell :: remainingSells)) => (orderBook.copy(offers = orderBook.offers + (sell.price -> remainingSells)), OrderExecuted(currentTime(), Execution(buy.id, sell.price), Execution(sell.id, sell.price)))
The ListOrderBook
takes advantage of the List
pattern matching to handle the two possible cross scenarios:
In the former scenario, the price level is removed from the book by removing the key from the offers TreeMap
. In the latter scenario, the remaining orders form the new price level. Clearly, the order book is optimized for executions over adding resting orders. You wonder why this bias exists in the order book implementation. You wonder to yourself, "perhaps, executions are more much more prevalent than resting orders?" You are unsure and make a mental note to chat with Dave.
Pause for a moment to consider biases in systems that you have designed. Did you optimize operations proportional to usage or latency constraints? Looking back, did your design choices lead you towards the best possible performance for the most important operations? Of course, hindsight makes it easy to call out suboptimal design choices. By reflecting on how you made these choices, you might be better able to avoid similar deficiencies in future systems.
The ListOrderBook
also supports the CancelOrder
command to remove an existing order by ID. Cancel requests pose an algorithmic challenge to ListOrderBook
. As only the order ID is provided, ListOrderBook
cannot efficiently determine which side the order rests on (that is, buy or sell). To determine the side, the buy and sell price levels are swept to find the order ID. This is an operation that is proportional to the number of price levels per side and the length of each price level. The worst case scenario is submitting an order ID that does not exist in the order book. The entire book must be swept to identify the absence of the provided order ID. A malicious trader could slow down MVT order book operations by submitting a constant stream of nonexistent order IDs. You make a note to talk with Dave about malicious trading activities and what MVT can do to defend against them.
Assuming that the order referenced by the cancel request exists in the book and its price level is discovered, the act of removing the cancelled order from the book is also expensive. Canceling is a linear time operation that requires traversing the linked list of orders and removing the node with the matching order ID. The following snippet implements canceling a sell order in ListOrderBook
:
orderBook.offers.find { case (price, priceLevel) => priceLevel.exists(_.id == idToCancel) } .fold[(ListOrderBook, Event)](orderBook -> OrderCancelRejected(currentTime(), idToCancel)) { case (price, priceLevel) => val updatedPriceLevel = priceLevel.filter(_.id != idToCancel) orderBook.copy(offers = updatedPriceLevel.nonEmpty match { case true => orderBook.offers + (price -> updatedPriceLevel) case false => orderBook.offers - price }) -> OrderCanceled(currentTime(), idToCancel)
Studying this snippet, it is unsurprising to you that cancelation performance is the least performant order book operation. There are two linear time passes performed per price level to cancel the order. First, exists
traverses the list of price level orders to determine whether the ID to be canceled exists in the price level. Once the price level containing the ID is found, there is a second traversal via filter
to update the state of the order book.
The cancelation implementation in ListOrderBook
is an illustration of the double-edged sword of Scala's expressive collection API. By virtue of being expressive, the cancelation logic is simple to understand and to maintain. However, its expressiveness also makes it easy to hide that the runtime performance of removing an order from a price level is 2 * N, where N is the number of orders in a price level. This simple example makes it clear that in a performance-sensitive environment, it is important to take a step back from the code to consider the runtime overhead of the data structure that is being used.
You refrain from judging ListOrderBook
too harshly because you know from your prior software development experiences that there were likely extenuating circumstances that led to this implementation. You turn your attention to the current order book implementation, which is in QueueOrderBook
. Looking over the source code, you are surprised to discover the implementation appears to match ListOrderBook
except for the price level data structure:
case class QueueOrderBook( bids: TreeMap[Price, Queue[BuyLimitOrder]], offers: TreeMap[Price, Queue[SellLimitOrder]])
The only difference between the two implementations is the use of scala.collection.immutable.Queue
in place of List
to represent a price level. From a modeling perspective, using a FIFO queue makes sense. As time priority dictates execution order, a FIFO queue is a natural fit to store resting orders. You begin wondering whether switching out List
for Queue
was done purely for modeling purposes. The question on your mind is, "how does replacing List
with Queue
improve order book performance?" Understanding this change requires digging deeper into Scala's Queue
implementation.
This snippet of a Queue
definition reveals an interesting insight:
class Queue[+A] protected(protected val in: List[A], protected val out: List[A])
Without reading deeply into the Queue
implementation, we see that it uses two Lists
to manage state. Given the usage of List
to model a FIFO queue in ListOrderBook
, it should not be surprising to see the usage of List
to build an immutable FIFO queue data structure. Let's look at the enqueue and dequeue operations to understand how in and out impact Queue
performance. The following snippet shows the implementation of enqueue:
def enqueue[B >: A](elem: B) = new Queue(elem :: in, out)
As the element is prepended to in
, enqueueing is a constant time operation. Recall that the analogous ListOrderBook
operation is adding a resting order, which has linear runtime performance. This is a clear performance win for QueueOrderBook
. Next, we consider dequeue implementation:
def dequeue: (A, Queue[A]) = out match { case Nil if !in.isEmpty => val rev = in.reverse ; (rev.head, new Queue(Nil, rev.tail)) case x :: xs => (x, new Queue(in, xs)) case _ => throw new NoSuchElementException("dequeue on empty queue") }
As the implementation shows, dequeue throws an exception when invoked with an empty Queue
. The exception is an unexpected outcome to invoking dequeue
and feels out of place in the functional programming paradigm. For this reason, Queue
also provides dequeueOption
that returns an Option
. This makes the handling of an empty Queue
explicit and easier to reason about. We recommend using dequeueOption
in any situation where you cannot guarantee that dequeue
will always be called on a nonempty Queue
.
The dequeue
operation is more involved than enqueue
due to the interaction between in
and out
. To understand how the Queue
state is managed with the dequeue
operations, review the following table. This table walks through a series of the enqueue
and dequeue
operations, listing the state of in
and out
at each step. As you review the table, consider which dequeue
patterns match statements that are invoked:
Operation |
In |
Out |
enqueue(1) |
List(1) |
Nil |
enqueue(2) |
List(1, 2) |
Nil |
enqueue(3) |
List(1, 2, 3) |
Nil |
dequeue |
Nil |
List(2, 3) |
dequeue |
Nil |
List(3) |
enqueue(4) |
List(4) |
List(3) |
dequeue |
List(4) |
Nil |
dequeue |
Nil |
Nil |
As the enqueue
and dequeue
invocations are intermingled, both in
and out
retain state. In the final sequence displayed, the queue returns to its initial state (that is, both in
and out
empty). The key insight from this implementation is that Queue
amortizes the cost of dequeue
to be constant time by deferring transfers from in
and out
. Each element transfer from in
and out
is a linear time reverse
operation to maintain first-in, first-out ordering. Deferring the cost of this expensive operation until out
is empty is a form of lazy evaluation. This is an illustrative example of how lazy evaluation can be used to improve runtime performance.
Now that you have an understanding of how Queue
is implemented, you can reason about the performance improvements delivered by QueueOrderBook
. The following table itemizes the runtime performance of each scenario to modify a price level:
Scenario |
ListOrderBook |
QueueOrderBook |
Add resting limit order |
Linear |
Constant |
Generate execution |
Constant |
Amortized constant |
Cancel order |
Linear |
Linear |
This table illustrates how understanding the runtime characteristics of the Scala collection API can result in tangible performance wins with small changes to your implementation. Recall that when QueueOrderBook
was introduced, it was noted that its implementation is identical to ListOrderBook
, the module changes to replace List
operations with analogous Queue
operations. This is a comparatively simple change for the performance boost shown previously.
You are excited to see the performance win to handle limit orders with QueueOrderBook
, but you are left wondering about what can be done about cancelation performance. It remains unsettling to you that QueueOrderBook
retains the same cancelation performance. In particular, because of the recent market volatility that exposed order book cancelation performance's weakness that caused MVT to trade unprofitably. Lazy evaluation was a big performance win to handle limit orders. Can this principle also be applied to cancel requests?
Queue provides high-performance enqueue
and dequeue
operations using the additional state, the second List
, to defer and to batch expensive operations. This principle can be applied to the order book. When canceling an order, there are two expensive operations:
Queue
or List
to remove the canceled orderFocusing on the second operation, the motivating question is, "how can the order book defer the cost of linear traversal to modify internal state?" To answer this question, it is often helpful to consider the strengths of your implementation. With either order book implementation, we know there is excellent execution performance. One strategy that takes advantage of this insight is to defer cancellation until order execution occurs. The approach is to use additional state to maintain the intent to cancel without removing the order from order book state until it is performant to do so. This approach could look like the following:
case class LazyCancelOrderBook( pendingCancelIds: Set[OrderId], bids: TreeMap[Price, Queue[BuyLimitOrder]], offers: TreeMap[Price, Queue[SellLimitOrder]])
The LazyCancelOrderBook
class adds additional state in the form of a scala.collection.immutable.Set
to manage the IDs of canceled requests that have not been reflected into the the state of bids
and offers
. Before diving into how pendingCancelIds
is used, let's investigate the Scala implementation of Set
.
Scala's implementation of Set
is neither an ADT, such as List
, nor a concrete implementation, such as TreeMap
. Instead, it is a trait, as shown in this snippet of its definition:
trait Set[A]
The reason the standard library defines it is as a trait is to support specific implementations depending upon the element count. The Set
companion object defines five implementations for sizes zero to four. Each implementation contains a fixed number of elements, as shown in Set3
, as follows:
class Set3[A] private[collection] (elem1: A, elem2: A, elem3: A)
When the number of elements is small, the runtime performance is faster with hand-rolled Set
implementations. With this technique, additions and removals point to the next or previous hand-rolled implementation. For example, consider +
and -
from Set3
:
def + (elem: A): Set[A] = if (contains(elem)) this else new Set4(elem1, elem2, elem3, elem) def - (elem: A): Set[A] = if (elem == elem1) new Set2(elem2, elem3) else if (elem == elem2) new Set2(elem1, elem3) else if (elem == elem3) new Set2(elem1, elem2) else this
After Set4
, the standard library uses an implementation named HashSet
. This is visible when adding an element to Set4
:
def + (elem: A): Set[A] = if (contains(elem)) this else new HashSet[A] + (elem1, elem2, elem3, elem4, elem)
The HashSet
is analogous to TreeMap
because it is backed by an efficient data structure to manage internal state. For HashSet
, the backing data structure is a hash trie. The hash trie provides amortized constant time performance for additions, removals, and contains operations as per the Scala collections performance overview (http://docs.scala-lang.org/overviews/collections/performance-characteristics.html). If you want to dig deeper into how a hash trie works, the Scala hash trie overview (http://docs.scala-lang.org/overviews/collections/concrete-immutable-collection-classes.html#hash-tries) is a good starting point.
Returning to the LazyCancelOrderBook
, we now know that common set operations with pendingCancelIds
are completed in amortized constant time. Provided that we focus on additions and removals, and contains operations, this suggests there will be minimal overhead as the size of the set increases. We can use pendingCancelIds
to represent the intent to remove an order from the order book without paying the cost of performing the removal. This simplifies the handling of a cancel order to be a constant time addition to pendingCancelIds
:
def handleCancelOrder( currentTime: () => EventInstant, ob: LazyCancelOrderBook, id: OrderId): (LazyCancelOrderBook, Event) = ob.copy(pendingCancelIds = ob.pendingCancelIds + id) -> OrderCanceled(currentTime(), id)
The implementation of handleCancelOrder
becomes trivial because the work to remove the order from the book is deferred. While this is a performance win, this implementation suffers from a serious deficiency. This implementation is no longer able to identify order IDs that are absent from the order book, which result in OrderCancelRejected
. One way to account for this requirement is to maintain an additional Set
containing order IDs actively resting on the book. Now, the LazyCancelOrderBook
state looks like the following:
case class LazyCancelOrderBook( activeIds: Set[OrderId], pendingCancelIds: Set[OrderId], bids: TreeMap[Price, Queue[BuyLimitOrder]], offers: TreeMap[Price, Queue[SellLimitOrder]])
With this definition, we can rewrite handleCancelOrder
to account for nonexistent order IDs:
def handleCancelOrder( currentTime: () => EventInstant, ob: LazyCancelOrderBook, id: OrderId): (LazyCancelOrderBook, Event) = ob.activeIds.contains(id) match { case true => ob.copy(activeIds = ob.activeIds - id, pendingCancelIds = ob.pendingCancelIds + id) -> OrderCanceled(currentTime(), id) case false => ob -> OrderCancelRejected(currentTime(), id) }
This implementation involves three amortized, constant time operations when the order ID exists in the book. First, there is an operation to identify whether or not the order ID exists in the order book. Then, the provided order ID is removed from the active ID set and added to the pending cancel set. Previously, this scenario required two linear runtime operations. The degenerate scenario of handling a nonexistent order ID now shrinks to a single amortized constant time operation.
Before celebrating performance wins, bear in mind that we still need to remove canceled orders from the book. To reduce the cost of cancelations, two potentially large sets were added to the order book, which increases the size of the memory footprint and garbage collection pressure. Additionally, benchmarking is needed to prove that theoretical performance improvements translate to real-world performance.
To complete LazyCancelOrderBook
implementation, we need to account for activeIds
when handling a limit order and pendingCancelIds
when generating an execution. As you may recall, handling a limit order involved two scenarios:
Here is a partially implemented snippet that prepares us to handle these two scenarios for a BuyLimitOrder
:
orderBook.bestOffer.exists(_.price.value <= buy.price.value) match { case true => ??? // crossing order case false => ??? // resting order
To support resting buy orders, the provided buy order must be enqueued and additionally, the buy order ID must be added to the activeOrderIds
set:
def restLimitOrder: (LazyCancelOrderBook, Event) = { val orders = orderBook.bids.getOrElse(buy.price, Queue.empty) orderBook.copy(bids = orderBook.bids + (buy.price -> orders.enqueue(buy)), activeIds = orderBook.activeIds + buy.id) -> LimitOrderAdded(currentTime()) } orderBook.bestOffer.exists(_.price.value <= buy.price.value) match { case true => ??? // crossing order case false => restLimitOrder
The logic to add a resting limit order is shown in the preceding code and extracted into a method named restLimitOrder
. This logic resembles the analogous scenario for ListOrderBook
with the added amortized constant time active order ID addition operation. This change is straightforward and adds little processing time overhead. Finally, we consider the more complicated order crossing scenario. This scenario is analogous to Queue.dequeue
in that this implementation pays the cost of the deferred action. The first dilemma to solve is identifying which order can be executed and which orders must be removed because they are canceled. findActiveOrder
supplies this functionality and is shown with the assumption that orderBook
is lexically in scope, as follows:
@tailrec def findActiveOrder( q: Queue[SellLimitOrder], idsToRemove: Set[OrderId]): (Option[SellLimitOrder], Option[Queue[SellLimitOrder]], Set[OrderId]) = q.dequeueOption match { case Some((o, qq)) => orderBook.pendingCancelIds.contains(o.id) match { case true => findActiveOrder(qq, idsToRemove + o.id) case false => (Some(o), if (qq.nonEmpty) Some(qq) else None, idsToRemove + o.id) } case None => (None, None, idsToRemove) }
findActiveOrder
recursively inspects a sell price level until an executable order is found or the price level is empty. In addition to optionally resolving a sell order that can be executed, the method returns the remaining price level. These order IDs have been canceled and must be removed from pendingCancelIds
. Here, we see the bulk of the canceled work deferred when the cancel request was handled. Execution is now amortized to be a constant time operation when executions occur repeatedly without a cancelation in-between. The worst case scenario is a linear runtime that is proportional to the number of canceled orders in the price level. Let's look at how findActiveOrder
is used to update the state of the order book:
orderBook.offers.headOption.fold(restLimitOrder) { case (price, offers) => findActiveOrder(offers, Set.empty) match { case (Some(o), Some(qq), rms) => (orderBook.copy( offers = orderBook.offers + (o.price -> qq), activeIds = orderBook.activeIds -- rms), OrderExecuted(currentTime(), Execution(buy.id, o.price), Execution(o.id, o.price))) case (Some(o), None, rms) => (orderBook.copy( offers = orderBook.offers - o.price, activeIds = orderBook.activeIds -- rms), OrderExecuted(currentTime(), Execution(buy.id, o.price), Execution(o.id, o.price))) case (None, _, rms) => val bs = orderBook.bids.getOrElse(buy.price, Queue.empty).enqueue(buy) (orderBook.copy(bids = orderBook.bids + (buy.price -> bs), offers = orderBook.offers - price, activeIds = orderBook.activeIds -- rms + buy.id), LimitOrderAdded(currentTime())) } }
Order crossing implementation is now arguably more complicated than in ListOrderBook
or QueueOrderBook
due to the work to remove canceled orders and to remove the removed order IDs from pendingCancelIds
. In all three pattern match statements, the set of returned order IDs returned as the final tuple member is removed from pendingCancelIds
to indicate that the order is now removed from the book. The first two pattern match statements handle the distinction between finding an active order with one or more remaining orders in the price level and finding an active order with zero remaining orders in the price level. In the latter scenario, the price level is removed from the book. The third pattern match statement accounts for the scenario where an active order is not found. If an active order is not found because all orders were pending cancelation, then, by definition, the entire price level was searched, and it is, therefore, now empty.
As a rigorous performance engineer, you realize that although your code compiles and your tests pass, your work is not yet complete. You begin pondering how to benchmark LazyCancelOrderBook
to determine whether or not your changes have improved real-world performance. Your first idea is to test cancelation in isolation to confirm that this operation has indeed been optimized. To do this, you rework CancelBenchmarks
, which was introduced in Chapter 2, Measuring Performance on the JVM, to work with QueueOrderBook
and LazyCancelOrderBook
. This benchmark sweeps different price level sizes canceling the first order, the last order, and a nonexistent order. We omit the source code because it is identical to the previous implementation and instead consider the results. These results were produced by running the following:
sbt 'project chapter4' 'jmh:run CancelBenchmarks -foe true'
The benchmark provides us with the following results:
Benchmark |
Enqueued order count |
Throughput (ops per second) |
Error as percentage of throughput |
|
1 |
6,912,696.09 |
± 0.44 |
|
1 |
25,676,031.5 |
± 0.22 |
|
10 |
2,332,046.09 |
± 0.96 |
|
10 |
12,656,750.43 |
± 0.31 |
|
1 |
5,641,784.63 |
± 0.49 |
|
1 |
25,619,665.34 |
± 0.48 |
|
10 |
1,788,885.62 |
± 0.39 |
|
10 |
13,269,215.32 |
± 0.30 |
|
1 |
9,351,630.96 |
± 0.19 |
|
1 |
31,742,147.67 |
± 0.65 |
|
10 |
6,897,164.11 |
± 0.25 |
|
10 |
24,102,925.78 |
± 0.24 |
This test demonstrates that LazyCancelOrderBook
consistently outperforms QueueOrderBook
when canceling the first order, the last order, and a nonexistent order across order queue sizes of one and ten. This is exactly as expected because LazyCancelOrderBook
defers the most expensive work until an order is executed. We see constant performance independent of the position of the order-to-be-canceled, which is further proof that the removal work is deferred. Also as expected, we see that canceling a nonexistent order results in improved performance because a linear traversal is no longer required to ascertain the absence of an order. However, we notice the performance hit as the enqueued order count increases from one to ten for LazyCancelOrderBook
. We can hypothesize that the nearly 50% throughput reduction is due to the overhead of managing the state of active and pending cancel order IDs.
This result is a promising sign that your changes are indeed improving the real-world performance. As the new implementation passed the initial litmus test, you think about how to representatively simulate a combination of executions and cancelations. You decide to focus on creating a microbenchmark that combines executions and cancelations to exercise LazyCancelOrderBook
in scenarios that more closely resemble production. You think back to a recent lunch conversation you had with Dave about market trading flows and recall that he said it is common to see about two cancelations per execution. Running with this idea, you create a benchmark that interleaves trades and cancelations. For both order book implementations, you want to test performance when during the following scenarios:
These three scenarios will help reveal shortcomings in LazyCancelOrderBook
by focusing on production-like order book activities. The benchmark requires initializing each order book with a set of resting orders to be canceled or executed against. The following snippet demonstrates how to initialize the order books in a JMH test:
@State(Scope.Benchmark) class InterleavedOrderState { var lazyBook: LazyCancelOrderBook = LazyCancelOrderBook.empty var eagerBook: QueueOrderBook = QueueOrderBook.empty @Setup def setup(): Unit = { lazyBook = (1 to maxOrderCount).foldLeft(LazyCancelOrderBook.empty) { case (b, i) => LazyCancelOrderBook.handle( () => EventInstant.now(), b, AddLimitOrder( CommandInstant.now(), BuyLimitOrder(OrderId(i), bidPrice)))._1 } eagerBook = (1 to maxOrderCount).foldLeft(QueueOrderBook.empty) { case (b, i) => QueueOrderBook.handle( () => EventInstant.now(), b, AddLimitOrder( CommandInstant.now(), BuyLimitOrder(OrderId(i), bidPrice)))._1 } } }
Before each trial, both order books will be filled with maxOrderCount
(defined to be 30) resting bids. As there are three scenarios to test and two order books, there are six benchmarks defined for this test. Each set of three scenarios is the same per order book implementation. To avoid duplication, the following snippet shows the three benchmarks implemented for LazyCancelOrderBook
:
@Benchmark def lazyOneToOneCT(state: InterleavedOrderState): LazyCancelOrderBook = { val b1 = LazyCancelOrderBook.handle(() => EventInstant.now(), state.lazyBook, firstCancel)._1 LazyCancelOrderBook.handle(() => EventInstant.now(), b1, firstCrossSell)._1 } @Benchmark def lazyTwoToOneCT(state: InterleavedOrderState): LazyCancelOrderBook = { val b1 = LazyCancelOrderBook.handle(() => EventInstant.now(), state.lazyBook, firstCancel)._1 val b2 = LazyCancelOrderBook.handle(() => EventInstant.now(), b1, secondCancel)._1 LazyCancelOrderBook.handle(() => EventInstant.now(), b2, firstCrossSell)._1 } @Benchmark def lazyOneToTwoCT(state: InterleavedOrderState): LazyCancelOrderBook = { val b1 = LazyCancelOrderBook.handle(() => EventInstant.now(), state.lazyBook, firstCancel)._1 val b2 = LazyCancelOrderBook.handle(() => EventInstant.now(), b1, firstCrossSell)._1 LazyCancelOrderBook.handle(() => EventInstant.now(), b2, secondCrossSell)._1 }
These benchmarks follow the convention of denoting the cancelation frequency ("C") first and the trade frequency ("T") second. For example, the final benchmark implements the scenario that represents one cancelation for every two trades. The commands are defined as values out-of-scope to avoid generating garbage during benchmark invocation. The benchmark invocation looks like the following:
sbt 'project chapter4' 'jmh:run InterleavedOrderBenchmarks -foe true'
This invocation produces the following results:
Benchmark |
Throughput (ops per second) |
Error as percentage of throughput |
|
797,339.08 |
± 2.63 |
|
1,123,157.94 |
± 1.26 |
|
854,635.26 |
± 2.48 |
|
1,469,338.46 |
± 1.85 |
|
497,368.11 |
± 0.72 |
|
1,208,671.60 |
± 1.69 |
Across the board, LazyCancelOrderBook
outperforms QueueOrderBook
. The relative difference between lazy and eager performance shows an interesting relationship. The following table captures the relative performance difference:
Benchmark |
LazyCancelOrderBook percentage performance improvement |
|
141.00% |
|
172.00% |
|
243.00% |
Studying the preceding table, we observe that LazyCancelOrderBook
shows the greatest performance win when there are two cancelations per trade. This result demonstrates the benefit of deferring the cost of processing a cancelation request. The next trend that we see is that as the frequency of trades increases and the frequency of cancelations decreases, QueueOrderBook
performance improves relative to LazyCancelOrderBook
. This result makes sense because LazyCancelOrderBook
incurs extra costs when performing a trade. In addition to searching for canceled orders, LazyCancelOrderBook
must update activeIds
. The QueueOrderBook
avoids these costs, but we see the overwhelming cost of cancelation processing continues to overshadow QueueOrderBook
performance. Summarizing these results, we have more confidence that LazyCancelOrderBook
is a stand-in replacement for QueueOrderBook
. In scenarios involving heavy volumes of cancelations, it appears to be a clear winner, and in other scenarios, it appears to maintain parity with QueueOrderBook
.
In this section, we leveraged Scala collections, in conjunction with the judicious use of lazy evaluation, to improve the performance of a critical component in MVT's infrastructure. By working through several order book implementations, you learned first-hand how a well-suited data structure can improve performance while a less optimal choice can derail performance. This exercise also exposed you to how Scala implements several of its collections, which you can now use to your advantage when working on a performance problem.
LazyCancelOrderBook
illustrates how valuable deferred evaluation can be in a performance-sensitive environment. When faced with a performance challenge, ask yourself the following questions to see whether it is possible to defer work (CPU work, not your actual work!). The following table lists each question and how it was answered with the order book:
Question |
Application to order book example |
How can I decompose into smaller discrete chunks? |
The act of canceling was decomposed into identifying the event that was sent to the requester and removing the canceled order from the book state. |
Why am I performing all of these steps now? |
Originally, order removal happened eagerly because it was the most logical way to model the process. |
Can I change any constraints to allow me to model the problem differently? |
Ideally, we would have liked to remove the constraint requiring rejection of nonexistent orders. Unfortunately, this was out of our control. |
What operations in my system are most performant? |
Executing an order and resting an order on the book are the most performant operations. We leveraged fast execution time to perform removals of canceled orders from the book. |
Like any approach, deferred evaluation is not a panacea. Diligent benchmarking and profiling are necessary to validate the benefit delivered by the change. Arguably the implementation of LazyCancelOrderBook
is more complicated than QueueOrderBook
, which will increase the cost to maintain the system. In addition to making implementation more complicated, it is now more difficult to reason about runtime performance due to the variable cost of order execution. For the scenarios that we tested, LazyCancelOrderBook
remained at parity with or better than QueueOrderBook
. However, we only exercised a few of the many possible scenarios, and we did so with only a single price level in the order book. In a real-world environment, additional benchmarking and profiling are needed to build enough confidence that this new implementation delivers better performance.