Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Lazy Collections and Event Sourcing

In the last chapter, we explored a number of Scala collections that readily perform evaluations eagerly. The Scala standard library provides two collections that operate lazily: views and streams. To motivate an exploration of these collections, we will tackle another performance dilemma at MVT revolving around performance reports that are generated for clients. In this chapter, we will cover the following topics:

Views
Stream processing with two real-world applications
Event sourcing
Markov chain generation

Improving the client report generation speed

Wanting to learn more about the customers of MVT, you decide to attend the weekly client status meeting. As you look around, you see that you are the only engineer here and everyone else is from the sales team. Johnny, the head of the MVT client management team, runs through a list of newly-signed on clients. Each time he reads off a name, a loud bell is rung. It seems like a strange custom to you, but the sales team is excitedly cheering each time the bell rings.

After the new client listing ends and the ringing in your ears stops, one of the sales team members asks Johnny, "When will the performance reports be generated faster? Clients are calling me everyday complaining about the inability to see their positions and profits and losses during the trading day. It's embarrassing that we do not have this kind of transparency, and we will lose business because of this." You realize that the report in question is a PDF that can be downloaded via the private web portal that is exposed by MVT to clients. Unless a client is sophisticated enough to set up his or her own reporting using MVT's performance API, then the client is dependent upon the portal to inspect recent trading performance.

Realizing that this is an opportunity to better understand the issue, you ask, "Hi, I'm from the engineering team. I thought I would sit in today to learn more about our clients. Can you share more about the reporting performance problem? I'd like to help address the concern." Through conversation with the sales team, you learn that the PDF report is a first step towards a real-time streaming web app. The PDF report allows MVT to quickly give trading performance insight to clients. Each time the client clicks View Performance, a report is generated that summarizes the performance trend by displaying whether or not the client has realized a profit or a loss in the last hour, day, and seven days. Particularly when the market is volatile, you learn that clients are more likely to generate reports. The sales team thinks this exacerbates the issue because reports generate even slower when everyone is trying to see recent trading performance. In some of the worst cases, the performance report takes about a dozen minutes to generate, which is totally unacceptable to clients that expect near real-time results.

Diving into the reporting code

Eager to dig into the problem, you find the repository that is responsible for working with reporting data. You explore the domain model to understand the concerns represented in this scope:

case class Ticker(value: String) extends AnyVal 
case class Price(value: BigDecimal) extends AnyVal 
case class OrderId(value: Long) extends AnyVal 
case class CreatedTimestamp(value: Instant) extends AnyVal 
case class ClientId(value: Long) extends AnyVal 
 
sealed trait Order { 
  def created: CreatedTimestamp 
  def id: OrderId 
  def ticker: Ticker 
  def price: Price 
  def clientId: ClientId 
} 
case class BuyOrder(created: CreatedTimestamp, id: OrderId, ticker: Ticker, price:   Price, clientId: ClientId) extends Order 
 
case class SellOrder(created: CreatedTimestamp, id: OrderId, ticker: Ticker, price: Price,clientId: ClientId) extends Order 
 
case class Execution(created: CreatedTimestamp, id: OrderId, price: Price)

In the reporting context, linking orders to executions is important to build the performance trend report because this association allows MVT to identify the profit or loss realized from the trade. ClientId is a concept that you have not worked with before when working on the order book or performing data analysis. The client ID is used to identify an MVT client's account. As trades are executed on behalf of clients, the client ID allows us to link an executed order to a client account.

Scanning the code base, you spot the representation of a performance trend report before it is converted into PDF format:

sealed trait LastHourPnL 
case object LastHourPositive extends LastHourPnL 
case object LastHourNegative extends LastHourPnL 
 
sealed trait LastDayPnL 
case object LastDayPositive extends LastDayPnL 
case object LastDayNegative extends LastDayPnL 
 
sealed trait LastSevenDayPnL 
case object LastSevenDayPositive extends LastSevenDayPnL 
case object LastSevenDayNegative extends LastSevenDayPnL 
 
case class TradingPerformanceTrend( 
  ticker: Ticker, 
  lastHour: LastHourPnL, 
  lastDay: LastDayPnL, 
  lastSevenDay: LastSevenDayPnL)

The profit and loss (PnL) trend is represented by distinct ADTs for each supported time period: the last hour, last day, and last seven days. For each stock ticker, these three time periods are included inTradingPerformanceTrend. Across multiple tickers, you infer a client can identify whether or not MVT is generating a profit or a loss over time. Inspecting the signature of the trend method which is responsible for computing TradingPerformanceTrend, you confirm your thinking:

def trend( 
  now: () => Instant, 
  findOrders: (Interval, Ticker) => List[Order], 
  findExecutions: (Interval, Ticker) => List[Execution], 
  request: GenerateTradingPerformanceTrend): 
List[TradingPerformanceTrend] 
 
case class GenerateTradingPerformanceTrend( 
  tickers: List[Ticker], clientId: ClientId)

Computing the performance trend requires a way to determine the current time in order to determine how far to look back to compute each time period's trend. The findOrders and findExecutions arguments are functions that query the reporting data store for orders and executions that were created within a time interval for a particular ticker. The final argument contains the client's ID and the tickers to report on. Each period's trend is computed by a generalized inner-method named periodPnL, which looks like the following:

  def periodPnL( 
    duration: Duration): Map[Ticker, PeriodPnL] = { 
    val currentTime = now() 
    val interval = new Interval(currentTime.minus(duration), currentTime) 
    (for { 
      ticker <- request.tickers 
      orders = findOrders(interval, ticker) 
      executions = findExecutions(interval, ticker) 
      idToExecPrice = executions.groupBy(_.id).mapValues(es => 
        Price.average(es.map(_.price))) 
      signedExecutionPrices = for { 
        o <- orders 
        if o.clientId == request.clientId 
        price <- idToExecPrice.get(o.id).map(p => o match { 
          case _: BuyOrder => Price(p.value * -1) 
          case _: SellOrder => p 
        }).toList 
      } yield price 
      trend = signedExecutionPrices.foldLeft(PnL.zero) { 
        case (pnl, p) => PnL(pnl.value + p.value) 
      } match { 
        case p if p.value >= PnL.zero.value => PeriodPositive 
        case _ => PeriodNegative 
      } 
    } yield ticker -> trend).toMap 
  }

The periodPnL method is an involved method that contains several logical steps. For each client-provided ticker, the associated orders and executions for the provided time period are retrieved. In order to correlate orders with executions, a map of OrderId to Execution is built by using groupBy. To simplify later calculations, the average execution price of each executed order is computed to reduce multiple executions for a single order to a single value.

With the idToExecPrice lookup table built, the next logical step is to filter out orders for other clients. Once only the client's orders remain, idToExecution is used to identify the orders that executed. The final two steps compute the performance trend by tabulating the client's absolute return (that is, profit and loss). The steps involve two additions to the domain model, as follows:

case class PnL(value: BigDecimal) extends AnyVal 
object PnL { 
  val zero: PnL = PnL(BigDecimal(0)) 
} 
 
sealed trait PeriodPnL 
case object PeriodPositive extends PeriodPnL 
case object PeriodNegative extends PeriodPnL

The PnL value is a value class that is used to represent the client's dollar return. PeriodPnL is analogous to the previously introduced ADT that can be applied to any time period of data. This allows PeriodPnL to be reused for the last hour, last day, and last seven days trend computations.

When the trade represents a buy, the execution price is negated because the transaction represents cash being exchanged for stock. When the trade represents a sell, the execution price remains positive because the transaction represents exchanging stock for cash. After computing the performance trend for each ticker, the List of the Ticker and PeriodPnL tuples is converted to a Map.

Digesting this implementation, you can start to imagine why generating this PDF is time-consuming. There is no sign of caching results, which means that the trend report is recomputed each time a client makes a request. As the number of clients requesting reports increases, there is an increased wait time while reports are computed. Re-architecting the reporting infrastructure to cache reports is too large a near-term change. Instead, you try to identify incremental changes that can improve report generation performance.

Using views to speed up report generation time

When working on the order book, we learned that List eagerly evaluates results. This property means that, in periodPnL, the de-sugared for-comprehension filter and map operations performed on orders produce new lists. That is, each transformation produces a new collection. For customers with large order counts, it can be costly in terms of CPU time to iterate over an order set three times, in addition to incurring garbage collection costs due to repeated List creation. To ameliorate this concern, Scala provides a way to defer transforming elements until an element is needed by a downstream computation.

Conceptually, this is done by adding a view on top of the eagerly evaluated collection that allows transformations to be defined with deferred evaluation semantics. A lazily evaluated view of a collection can be constructed from any Scala collection by invoking view. For example, this snippet creates a view from a List of integers:

val listView: SeqView[Int, List[Int]] = List(1, 2, 3).view

From this snippet, we learn that Scala represents a view into a collection with a different SeqView type that is parameterized by two types: the collection element, and the collection type. Seeing a view in use makes it easier to understand its runtime differences with an eagerly evaluated collection. Consider the following snippet performing the same operations on a List and a view over a List:

println("List evaluation:") 
val evens = List(0, 1, 2, 3, 4, 5).map(i => { 
  println(s"Adding one to $i") 
  i + 1 
}).filter(i => { 
  println(s"Filtering $i") 
  i % 2 == 0 
}) 
 
println("--- Printing first two even elements ---") 
println(evens.take(2)) 
 
println("View evaluation:") 
val evensView = List(0, 1, 2, 3, 4, 5).view.map(i => { 
  println(s"Adding one to $i") 
  i + 1 
}).filter(i => { 
  println(s"Filtering $i") 
  i % 2 == 0 
}) 
 
println("--- Printing first two even elements ---") 
println(evensView.take(2).toList)

This snippet performs simple arithmetic and then filters to find the even elements. For the sake of deepening our understanding, the snippet breaks the functional paradigm by adding the println side effect. The output of the list evaluation is as expected:

List evaluation: 
Adding one to 0 
Adding one to 1 
Adding one to 2 
Adding one to 3 
Adding one to 4 
Adding one to 5 
Filtering 1 
Filtering 2 
Filtering 3 
Filtering 4 
Filtering 5 
Filtering 6 
--- Printing first two even elements --- 
List(2, 4)

With eager evaluation, each transformation is applied to each element before moving to the next transformation. Now, consider the following output from view evaluation:

View evaluation: 
--- Printing first two even elements --- 
Adding one to 0 
Filtering 1 
Adding one to 1 
Filtering 2 
Adding one to 2 
Filtering 3 
Adding one to 3 
Filtering 4 
List(2, 4)

As we discussed earlier, with lazy evaluation no transformations are applied until an element is needed. In this example, this means that the addition and filtering do not occur until the invocation of toList. The absence of output after "view evaluation" is evidence that zero transformations occurred. Curiously, we also see that only the first four of six elements are evaluated. When a view applies transformations, it applies all transformations to each element rather than applying each transformation to all elements. By applying all transformations in one step, the view is able to return the first two elements without evaluating the entire collection. Here, we see the potential performance gains from view usage due to lazy evaluation. Before applying the concept of views to the performance trend report, let's take a deeper look at view implementation.

Constructing a custom view

Views are able to defer evaluation by returning a data structure that composes the previous transformation state with the next transformation. The Scala implementation of views is admittedly complicated to digest because it provides a large number of capabilities while retaining support for all Scala collections. To build an intuition for how views are implemented, let's construct our own lazily evaluated view that works only for List and only supports map operations. To begin, we define the operations that are supported by our implementation of a PseudoView view:

sealed trait PseudoView[A] { 
  def map[B](f: A => B): PseudoView[B] 
  def toList: List[A] 
}

The PseudoView is defined as a trait that supports lazy application of a transformation from A to B and also supports evaluating all transformations to return a List. Next, we define two view types of view to support the initial case when zero transformations have been applied and to support applying a transformation to a previously transformed view. The signatures are shown in the following snippet:

final class InitialView[A](xs: List[A]) extends PseudoView[A] 
final class ComposedView[A, B](xs: List[A], fa: A => B) extends PseudoView[B]

In both scenarios, the original List must be carried through to support eventually applying the transformations. In the InitialView base case, there are zero transformations, which is why there is no additional state. ComposedView supports chaining computations by carrying the state of the previous fa transformation.

Implementing InitialView is a straightforward delegation to ComposedView:

final class InitialView[A](xs: List[A]) extends PseudoView[A] { 
  def map[B](f: A => B): PseudoView[B] = new ComposedView[A, B](xs, f) 
  def toList: List[A] = xs 
}

The List implementation shows how transformations are chained together using function composition:

final class ComposedView[A, B](xs: List[A], fa: A => B) extends PseudoView[B] { 
  def map[C](f: B => C): PseudoView[C] = new ComposedView(xs, f.compose(fa)) 
  def toList: List[B] = xs.map(fa) 
}

Let's construct a PseudoView companion object that provides view construction, as follows:

object PseudoView {  
  def view[A, B](xs: List[A]): PseudoView[A] = new InitialView(xs) 
}

We can now exercise PseudoView with a simple program to demonstrate that it defers evaluation:

println("PseudoView evaluation:") 
val listPseudoView = PseudoView.view(List(0, 1, 2)).map(i => { 
  println(s"Adding one to $i") 
  i + 1 
}).map(i => { 
  println(s"Multiplying $i") 
  i * 2 
}) 
 
println("--- Converting PseudoView to List ---") 
println(listPseudoView.toList)

Running this program, we see output equivalent to usage of Scala's view implementation:

PseudoView evaluation: 
--- Converting PseudoView to List --- 
Adding one to 0 
Multiplying 1 
Adding one to 1 
Multiplying 2 
Adding one to 2 
Multiplying 3 
List(2, 4, 6)

PseudoView helps build an intuition about how Scala implements views. From here, you can begin considering how to support other operations. For example, how can filter be implemented? The filter is interesting to consider because it constrains the original collection. As defined, PseudoView is ill-equipped to support the filter operations, which is one illustration of the complexity that is handled by Scala views. Scala views tackles this challenge by defining a trait named Transformed. The Transformed trait is the base trait for all view operations. A partial definition is shown, as follows:

trait Transformed[+B] extends GenTraversableView[B, Coll] { 
  def foreach[U](f: B => U): Unit 
  lazy val underlying = self.underlying 
}

The underlying lazy value is how the originally wrapped collection is accessed. This is analogous to how PseudoView passed the List state into ComposedView. Transformed defines a side-effecting foreach operation to support collection operations in a lazy manner. Using foreach allows implementations of this trait to modify the underlying collection. This is how filter is implemented:

trait Filtered extends Transformed[A] { 
  protected[this] val pred: A => Boolean 
  def foreach[U](f: A => U) { 
    for (x <- self) 
      if (pred(x)) f(x) 
  } 
}

Transformed is used within the view API to maintain the state of necessary operations, while the external API supports interacting with SeqView. Following another pattern that is commonly found in Scala collections, SeqView inherits a number of operations by mixing in other traits. SeqView indirectly mixes in TraversableViewLike, which provides access to the Transformed operations.

Applying views to improve report generation performance

With our newly-developed intuition for views, we may view (no pun intended!) the construction of performance trend reports differently. Scala's implementation of views makes it trivial to switch from eagerly evaluated collections to a lazily evaluated version. If you recall, once the order ID to the average execution price lookup table is constructed, a series of transformations are applied to the orders that are retrieved for the duration and ticker. By converting orders to a view, there is an opportunity to avoid unnecessary transformations and improve the speed of the performance trend report.

While it is trivial to convert to a view, it is less trivial to identify under which conditions lazy evaluation out-performs eager evaluation. As a good performance engineer, you want to benchmark your proposed change, but you do not have access to historical order and execution data to build a benchmark. Instead, you write a microbenchmark that simulates the problem that you are modeling. The question that you are trying to answer is, "For what size collection and what number of operations does it make sense to use a view over a List?" There is a cost to constructing a view because it involves retaining information about the deferred transformation, which implies it will not always be the most performant solution. You come up with the following scenarios to help answer your question:

  @Benchmark 
  def singleTransformList(state: ViewState): List[Int] = 
    state.numbers.map(_ * 2) 
 
  @Benchmark 
  def singleTransformView(state: ViewState): Vector[Int] = 
    state.numbers.view.map(_ * 2).toVector 
 
  @Benchmark 
  def twoTransformsList(state: ViewState): List[Int] = 
    state.numbers.map(_ * 2).filter(_ % 3 == 0) 
 
  @Benchmark 
  def twoTransformsView(state: ViewState): Vector[Int] = 
    state.numbers.view.map(_ * 2).filter(_ % 3 == 0).toVector 
 
  @Benchmark 
  def threeTransformsList(state: ViewState): List[Int] = 
    state.numbers.map(_ * 2).map(_ + 7).filter(_ % 3 == 0) 
 
  @Benchmark 
  def threeTransformsView(state: ViewState): Vector[Int] = 
    state.numbers.view.map(_ * 2).map(_ + 7).filter(_ % 3 == 0).toVector

For each collection type, a List, and a view over a Vector, you define three tests that exercise an increasing number of transformations. Vector is used instead of List because toList on a view is not specialized for List. As we have previously seen, List operations are written to take advantage of constant time and prepend performance. The toList performs linear time append operations, which gives the false impression that views deliver lower performance. Switching to Vector provides effectively constant time append operations. The state for this benchmark looks like the following:

@State(Scope.Benchmark) 
class ViewState { 
 
  @Param(Array("10", "1000", "1000000")) 
  var collectionSize: Int = 0 
 
  var numbers: List[Int] = Nil 
 
  @Setup 
  def setup(): Unit = { 
    numbers = (for (i <- 1 to collectionSize) yield i).toList 
  } 
}

ViewState sweeps different collection sizes to help identify how sensitive view performance is to collection size. The benchmark is invoked via the following:

sbt 'project chapter5' 'jmh:run ViewBenchmarks -foe true'

This invocation produces the following results:

Benchmark	Collection size	Throughput (ops per second)	Error as percentage of throughput
`singleTransformList`	10	15,171,067.61	± 2.46
`singleTransformView`	10	3,175,242.06	± 1.37
`singleTransformList`	1,000	133,818.44	± 1.58
`singleTransformView`	1,000	52,688.80	± 1.11
`singleTransformList`	1,000,000	30.40	± 2.72
`singleTransformView`	1,000,000	86.54	± 1.17
`twoTransformsList`	10	5,008,830.88	± 1.12
`twoTransformsView`	10	4,564,726.04	± 1.05
`twoTransformsList`	1,000	44,252.83	± 1.08
`twoTransformsView`	1,000	80,674.76	± 1.12
`twoTransformsList`	1,000,000	22.85	± 3.78
`twoTransformsView`	1,000,000	77.59	± 1.46
`threeTransformsList`	10	3,360,399.58	± 1.11
`threeTransformsView`	10	3,438,977.91	± 1.27
`threeTransformsList`	1,000	36,226.87	± 1.65
`threeTransformsView`	1,000	58,981.24	± 1.80
`threeTransformsList`	1,000,000	10.33	± 3.58
`threeTransformsView`	1,000,000	49.01	± 1.36

The results give us an interesting insight into the cases where using a view yields better performance. For a small collection, such as 10 elements in our benchmark, a List performs better, regardless of the amount of operations, although this gap closes at 1,000,000 elements. When transforming a large collection, 1,000,000 elements in our benchmark, a view is more efficient with an increasing differential as the number of transformations increases. For example, with 1,000,000 elements and two transformations, views deliver approximately triple the throughput of List. In the case of a medium size collection, such as 1,000 elements in this example, this is not as clear-cut. When performing a single transformation, an eager List performs better, while a view delivers better throughput when applying more than one transformation.

As the volume of your data and the transformation count increase, it becomes more likely that a view offers better performance. Here, you see the tangible benefit of avoiding intermediate collections. A second axis of performance to consider is the nature of the transformation. Transformations that benefit from early termination (for example, find), benefit strongly from lazy evaluation. This benchmark illustrates that it is important to understand the size of your data and the transformations that you intend to perform.

View caveats

Views offer a simple way to improve performance with minimally invasive changes to your system. The ease of use is part of the allure of views, which may tempt you to use them more frequently than you otherwise would. As our benchmarking in the previous section shows, there is a nontrivial overhead to using views, which means defaulting to views is a suboptimal choice. Looking past the pure performance perspective, there are other reasons to tread carefully when using views.

SeqView extends Seq

As views mirror the collection API, it can be a challenge to identify when transformations are being applied lazily. For this reason, we recommend setting well-defined boundaries for view usage. When working on client reporting, we limited view usage to a single inner-function and used a List eager collection type as the return type. Minimizing the area of a system performing a lazy evaluation can reduce cognitive load when building a runtime execution mental model.

On a related note, we feel that it is important to be cautious about how a view is transformed into an eagerly evaluated collection type. We showed conversion by invoking toList, which makes the intent explicit. SeqView also provides a force method to force evaluation. As a general rule, we avoid using force because it typically returns scala.collection.immutable.Seq. SeqView retains the collection type as its second generic parameter, which allows force to return the original collection type when there is enough evidence. However, certain operations, such as map, cause the view to lose evidence of the original collection type. When this happens, force returns the more general Seq collection type. Seq is a trait that is a super-type to all sequences in the collection library, including views and another lazy data structure that we will discuss later, named scala.collection.immutable.Stream. This inheritance scheme allows the following three statements to compile:

val list: Seq[Int] = List(1, 2, 3) 
val view: Seq[Int] = list.view 
val stream: Seq[Int] = list.toStream

We believe this is undesirable because the Seq data type hides critical information about the underlying implementation. It represents both lazy and eagerly evaluated collections with the same type. Consider the following snippet example to understand why this is undesirable:

def shouldGenerateOrder(xs: Seq[Execution]): Boolean =  
  xs.size >= 3

In this manufactured example, imagine that shouldGenerateOrder is invoked with a Vector, but then later the Vector is swapped out for SeqView. With Vector, identifying collection length is a constant time operation. With SeqView, you cannot reason with certainty about the runtime of the operation, except to say that it is definitely more expensive than Vector.size. Seq usage, and, therefore, the usage of force, should be avoided because it is difficult to reason about runtime behavior, and this can lead to unexpected side-effects.

In a typical software system, areas of responsibility are separated into discrete modules. Using the performance trend reporting example, you can imagine a separate module containing the translation from List[TradingPerformanceTrend] to a PDF report. You may be tempted to expose the view to other modules to extend the benefit of lazy transformations. If benchmarks justify making this type of change, then we encourage you to choose one of these options. Our preferred choice in this scenario is to use Stream, which is a lazily evaluated version of List. We explore Stream later in this chapter. Alternatively, if Stream cannot be used, be strict in your use of the SeqView datatype to clearly demarcate that the collection is lazily evaluated.

Views are not memoizers

One additional consideration when using views is to be cognizant of when transformations are repeatedly applied. For example consider this manufactured example that focuses on a use case where a view is used as a base for multiple computations:

> val xs = List(1,2,3,4,5).view.map(x => {  println(s"multiply $x"); x * 2 }) 
xs: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...) 
> val evens = xs.filter(_ % 2 == 0).toList 
multiply 1 
multiply 2 
multiply 3 
multiply 4 
multiply 5 
evens: List[Int] = List(2, 4, 6, 8, 10) 
 
> val odds = xs.filter(_ % 2 != 0).toList 
multiply 1 
multiply 2 
multiply 3 
multiply 4 
multiply 5 
odds: List[Int] = List()

In this example, xs is a view on a list of integers. A map transformation is lazily applied to multiply these integers by 2. The view is then used to create two List instances, one containing even elements, the other containing odd elements. We observe that the transformation is applied to the view twice, each time we turn the view into a list. This shows that the transformation is lazily applied, but the results of the computation are not cached. This is a characteristic of views to keep in mind, as expensive transformations applied several times can cause significant slowdowns. This is also the reason why side-effects should be avoided in transformations applied to views. If, for some reason, referential transparency is not upheld, the combination of side-effects and multiple evaluations due to view usage can lead to exceptionally difficult to maintain software.

This example is straightforward, and the misuse of views is easy to spot. However, even methods that are provided by the standard library can lead to undesirable results when used with views. Consider this snippet:

> val (evens, odds) = List(1,2,3,4,5).view.map(x => {  println(s"multiply $x"); x * 2 }).partition(_ % 2 == 0) 
evens: scala.collection.SeqView[Int,Seq[_]] = SeqViewMF(...) 
odds: scala.collection.SeqView[Int,Seq[_]] = SeqViewMF(...) 
 
> println(evens.toList, odds.toList) 
multiply 1 
multiply 2 
multiply 3 
multiply 4 
multiply 5 
multiply 1 
multiply 2 
multiply 3 
multiply 4 
multiply 5 
(List(2, 4, 6, 8, 10),List())

This example achieves the same results as the previous sample, but we rely on the built-in partition method to split the original list into two distinct collections each operating on the original view. Again, we see the map transformation applied twice to the original view. This is due to the underlying implementation of partition in TraversableViewLike. The main takeaway is that views and lazy evaluation can help yield better performance, but they should be used carefully. It is a good idea to experiment and try your algorithm in the REPL to confirm that you are using views correctly.

In our running example on reporting on trading performance trends, we saw an easy-to-miss example of lazy evaluation when operating on a Map. Recall that there was a lookup table built using the following code:

executions.groupBy(_.id).mapValues(es =>
Price.average(es.map(_.price)))

The return type of mapValues is Map[A, B], which does not suggest any difference in evaluation strategy. Let's run a simple example in the REPL:

> val m = Map("a" -> 1, "b" -> 2)
m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2)
> val m_prime = m.mapValues{ v => println(s"Mapping $v"); v * 2}
Mapping 1
Mapping 2
m_prime: scala.collection.immutable.Map[String,Int] = Map(a -> 2, b -> 4)
> m_prime.get("a")
Mapping 1
res0: Option[Int] = Some(2)
> m_prime.get("a")
Mapping 1
res1: Option[Int] = Some(2)

Notice how, each time we call get on m_prime to retrieve a value, we can observe the transformation being applied, even when using the same key. The mapValues is a lazily-evaluated transformation of each value in the map akin to a view operating on the keys of a map. The types that are involved do not provide any insight, and unless you inspect the implementation of Map or carefully read the documentation that is associated with mapValues, you will likely miss this important detail. Consider the caveats of views when working with mapValues.

Zipping up report generation

While investigating the implementation of TradingPerformanceTrend, we took a deep dive into views and found how they can improve performance. We now return to the implementation of trend to complete the generation of the List[radingPerformanceTrend]. The following snippet shows trend with the implementation of periodPnL hidden because we thoroughly reviewed it:

def trend( 
  now: () => Instant, 
  findOrders: (Duration, Ticker) => List[Order], 
  findExecutions: (Duration, Ticker) => List[Execution], 
    request: GenerateTradingPerformanceTrend): List[TradingPerformanceTrend] = { 
    def periodPnL( 
      start: Instant => Instant): Map[Ticker, PeriodPnL] = { ... } 
    
    val tickerToLastHour = periodPnL(now => 
      now.minus(Period.hours(1).getMillis)).mapValues { 
      case PeriodPositive => LastHourPositive 
      case PeriodNegative => LastHourNegative 
    } 
    val tickerToLastDay = periodPnL(now => 
      now.minus(Period.days(1).getMillis)).mapValues { 
      case PeriodPositive => LastDayPositive 
      case PeriodNegative => LastDayNegative 
    } 
    val tickerToLastSevenDays = periodPnL(now => 
      now.minus(Period.days(7).getMillis)).mapValues { 
      case PeriodPositive => LastSevenDayPositive 
      case PeriodNegative => LastSevenDayNegative 
    } 
    tickerToLastHour.zip(tickerToLastDay).zip(tickerToLastSevenDays).map({ 
      case (((t, lastHour), (_, lastDay)), (_, lastSevenDays)) => 
        TradingPerformanceTrend(t, lastHour, lastDay, lastSevenDays) 
    }).toList 
  }

This method focuses on marshaling the translation of PnL for a time period to the appropriate time period's performance trend. The final expression involving two invocations of zip makes the transformation from three maps with keys of Ticker and corresponding period PnL trend values to List[TradingPerformanceTrend] elegant. zip iterates over two collections to yield a tuple for each index of both collections. Here is a simple snippet to illustrate zip usage:

println(List(1, 3, 5, 7).zip(List(2, 4, 6)))

This yields the following:

List((1,2), (3,4), (5,6))

The result is that corresponding indexes are "zipped" together. For example, at index one, the first list's value is three and the second list's value is four, yielding the tuple, (3, 4). The first list has four elements while the second list only has three elements; this is silently omitted from the resulting collection. This behavior is well-documented, but it might be unexpected at first glance. In our reporting use case, we are certain that each key (that is, each Ticker), appears in all three maps. In this use case, we are certain that all three maps are of equal length.

However, there is a subtle bug in our usage of zip. The zip uses a collection's iterator to iterate over elements, which implies that usage of zip is sensitive to ordering. Each of the three maps is constructed by invoking toMap, which indirectly delegates to a scala.collection.immutable.HashMap implementation of Map. Similar to Set, Scala provides several handwritten implementations of Map (for example, Map2) for small collection sizes before constructing a HashMap. By now, you may realize the flaw, HashMap does not guarantee ordering.

To fix this bug and retain usage of zip, we can leverage our earlier discovery of SortedMap, the trait backed by TreeMap with sorted keys. Swapping out Map for SortedMap and making appropriate changes to define an Ordering for Ticker, we now have a bug-free, elegant solution to generating trading performance trend reports. With a judicious usage of views, we found a way to deliver iterative performance improvements with minimally invasive changes. This will give the sales team something to ring the bell about! This gives us additional time to consider other approaches to generating reports.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Lazy Collections and Event Sourcing

Create new playlist

Sign In

Sign Up