In the last chapter, we explored a number of Scala collections that readily perform evaluations eagerly. The Scala standard library provides two collections that operate lazily: views and streams. To motivate an exploration of these collections, we will tackle another performance dilemma at MVT revolving around performance reports that are generated for clients. In this chapter, we will cover the following topics:
Wanting to learn more about the customers of MVT, you decide to attend the weekly client status meeting. As you look around, you see that you are the only engineer here and everyone else is from the sales team. Johnny, the head of the MVT client management team, runs through a list of newly-signed on clients. Each time he reads off a name, a loud bell is rung. It seems like a strange custom to you, but the sales team is excitedly cheering each time the bell rings.
After the new client listing ends and the ringing in your ears stops, one of the sales team members asks Johnny, "When will the performance reports be generated faster? Clients are calling me everyday complaining about the inability to see their positions and profits and losses during the trading day. It's embarrassing that we do not have this kind of transparency, and we will lose business because of this." You realize that the report in question is a PDF that can be downloaded via the private web portal that is exposed by MVT to clients. Unless a client is sophisticated enough to set up his or her own reporting using MVT's performance API, then the client is dependent upon the portal to inspect recent trading performance.
Realizing that this is an opportunity to better understand the issue, you ask, "Hi, I'm from the engineering team. I thought I would sit in today to learn more about our clients. Can you share more about the reporting performance problem? I'd like to help address the concern." Through conversation with the sales team, you learn that the PDF report is a first step towards a real-time streaming web app. The PDF report allows MVT to quickly give trading performance insight to clients. Each time the client clicks View Performance, a report is generated that summarizes the performance trend by displaying whether or not the client has realized a profit or a loss in the last hour, day, and seven days. Particularly when the market is volatile, you learn that clients are more likely to generate reports. The sales team thinks this exacerbates the issue because reports generate even slower when everyone is trying to see recent trading performance. In some of the worst cases, the performance report takes about a dozen minutes to generate, which is totally unacceptable to clients that expect near real-time results.
Eager to dig into the problem, you find the repository that is responsible for working with reporting data. You explore the domain model to understand the concerns represented in this scope:
case class Ticker(value: String) extends AnyVal case class Price(value: BigDecimal) extends AnyVal case class OrderId(value: Long) extends AnyVal case class CreatedTimestamp(value: Instant) extends AnyVal case class ClientId(value: Long) extends AnyVal sealed trait Order { def created: CreatedTimestamp def id: OrderId def ticker: Ticker def price: Price def clientId: ClientId } case class BuyOrder(created: CreatedTimestamp, id: OrderId, ticker: Ticker, price: Price, clientId: ClientId) extends Order case class SellOrder(created: CreatedTimestamp, id: OrderId, ticker: Ticker, price: Price,clientId: ClientId) extends Order case class Execution(created: CreatedTimestamp, id: OrderId, price: Price)
In the reporting context, linking orders to executions is important to build the performance trend report because this association allows MVT to identify the profit or loss realized from the trade. ClientId
is a concept that you have not worked with before when working on the order book or performing data analysis. The client ID is used to identify an MVT client's account. As trades are executed on behalf of clients, the client ID allows us to link an executed order to a client account.
Scanning the code base, you spot the representation of a performance trend report before it is converted into PDF format:
sealed trait LastHourPnL case object LastHourPositive extends LastHourPnL case object LastHourNegative extends LastHourPnL sealed trait LastDayPnL case object LastDayPositive extends LastDayPnL case object LastDayNegative extends LastDayPnL sealed trait LastSevenDayPnL case object LastSevenDayPositive extends LastSevenDayPnL case object LastSevenDayNegative extends LastSevenDayPnL case class TradingPerformanceTrend( ticker: Ticker, lastHour: LastHourPnL, lastDay: LastDayPnL, lastSevenDay: LastSevenDayPnL)
The profit and loss (PnL) trend is represented by distinct ADTs for each supported time period: the last hour, last day, and last seven days. For each stock ticker, these three time periods are included inTradingPerformanceTrend
. Across multiple tickers, you infer a client can identify whether or not MVT is generating a profit or a loss over time. Inspecting the signature of the trend
method which is responsible for computing TradingPerformanceTrend
, you confirm your thinking:
def trend( now: () => Instant, findOrders: (Interval, Ticker) => List[Order], findExecutions: (Interval, Ticker) => List[Execution], request: GenerateTradingPerformanceTrend): List[TradingPerformanceTrend] case class GenerateTradingPerformanceTrend( tickers: List[Ticker], clientId: ClientId)
Computing the performance trend requires a way to determine the current time in order to determine how far to look back to compute each time period's trend. The findOrders
and findExecutions
arguments are functions that query the reporting data store for orders and executions that were created within a time interval for a particular ticker. The final argument contains the client's ID and the tickers to report on. Each period's trend is computed by a generalized inner-method named periodPnL
, which looks like the following:
def periodPnL( duration: Duration): Map[Ticker, PeriodPnL] = { val currentTime = now() val interval = new Interval(currentTime.minus(duration), currentTime) (for { ticker <- request.tickers orders = findOrders(interval, ticker) executions = findExecutions(interval, ticker) idToExecPrice = executions.groupBy(_.id).mapValues(es => Price.average(es.map(_.price))) signedExecutionPrices = for { o <- orders if o.clientId == request.clientId price <- idToExecPrice.get(o.id).map(p => o match { case _: BuyOrder => Price(p.value * -1) case _: SellOrder => p }).toList } yield price trend = signedExecutionPrices.foldLeft(PnL.zero) { case (pnl, p) => PnL(pnl.value + p.value) } match { case p if p.value >= PnL.zero.value => PeriodPositive case _ => PeriodNegative } } yield ticker -> trend).toMap }
The periodPnL
method is an involved method that contains several logical steps. For each client-provided ticker, the associated orders and executions for the provided time period are retrieved. In order to correlate orders with executions, a map of OrderId
to Execution
is built by using groupBy
. To simplify later calculations, the average execution price of each executed order is computed to reduce multiple executions for a single order to a single value.
With the idToExecPrice
lookup table built, the next logical step is to filter out orders for other clients. Once only the client's orders remain, idToExecution
is used to identify the orders that executed. The final two steps compute the performance trend by tabulating the client's absolute return (that is, profit and loss). The steps involve two additions to the domain model, as follows:
case class PnL(value: BigDecimal) extends AnyVal object PnL { val zero: PnL = PnL(BigDecimal(0)) } sealed trait PeriodPnL case object PeriodPositive extends PeriodPnL case object PeriodNegative extends PeriodPnL
The PnL
value is a value class that is used to represent the client's dollar return. PeriodPnL
is analogous to the previously introduced ADT that can be applied to any time period of data. This allows PeriodPnL
to be reused for the last hour, last day, and last seven days trend computations.
When the trade represents a buy, the execution price is negated because the transaction represents cash being exchanged for stock. When the trade represents a sell, the execution price remains positive because the transaction represents exchanging stock for cash. After computing the performance trend for each ticker, the List
of the Ticker
and PeriodPnL
tuples is converted to a Map
.
Digesting this implementation, you can start to imagine why generating this PDF is time-consuming. There is no sign of caching results, which means that the trend report is recomputed each time a client makes a request. As the number of clients requesting reports increases, there is an increased wait time while reports are computed. Re-architecting the reporting infrastructure to cache reports is too large a near-term change. Instead, you try to identify incremental changes that can improve report generation performance.
When working on the order book, we learned that List
eagerly evaluates results. This property means that, in periodPnL
, the de-sugared for-comprehension filter
and map
operations performed on orders
produce new lists. That is, each transformation produces a new collection. For customers with large order counts, it can be costly in terms of CPU time to iterate over an order set three times, in addition to incurring garbage collection costs due to repeated List
creation. To ameliorate this concern, Scala provides a way to defer transforming elements until an element is needed by a downstream computation.
Conceptually, this is done by adding a view on top of the eagerly evaluated collection that allows transformations to be defined with deferred evaluation semantics. A lazily evaluated view of a collection can be constructed from any Scala collection by invoking view
. For example, this snippet creates a view from a List
of integers:
val listView: SeqView[Int, List[Int]] = List(1, 2, 3).view
From this snippet, we learn that Scala represents a view into a collection with a different SeqView
type that is parameterized by two types: the collection element, and the collection type. Seeing a view in use makes it easier to understand its runtime differences with an eagerly evaluated collection. Consider the following snippet performing the same operations on a List
and a view over a List
:
println("List evaluation:") val evens = List(0, 1, 2, 3, 4, 5).map(i => { println(s"Adding one to $i") i + 1 }).filter(i => { println(s"Filtering $i") i % 2 == 0 }) println("--- Printing first two even elements ---") println(evens.take(2)) println("View evaluation:") val evensView = List(0, 1, 2, 3, 4, 5).view.map(i => { println(s"Adding one to $i") i + 1 }).filter(i => { println(s"Filtering $i") i % 2 == 0 }) println("--- Printing first two even elements ---") println(evensView.take(2).toList)
This snippet performs simple arithmetic and then filters to find the even elements. For the sake of deepening our understanding, the snippet breaks the functional paradigm by adding the println
side effect. The output of the list evaluation is as expected:
List evaluation: Adding one to 0 Adding one to 1 Adding one to 2 Adding one to 3 Adding one to 4 Adding one to 5 Filtering 1 Filtering 2 Filtering 3 Filtering 4 Filtering 5 Filtering 6 --- Printing first two even elements --- List(2, 4)
With eager evaluation, each transformation is applied to each element before moving to the next transformation. Now, consider the following output from view evaluation:
View evaluation: --- Printing first two even elements --- Adding one to 0 Filtering 1 Adding one to 1 Filtering 2 Adding one to 2 Filtering 3 Adding one to 3 Filtering 4 List(2, 4)
As we discussed earlier, with lazy evaluation no transformations are applied until an element is needed. In this example, this means that the addition and filtering do not occur until the invocation of toList
. The absence of output after "view evaluation" is evidence that zero transformations occurred. Curiously, we also see that only the first four of six elements are evaluated. When a view applies transformations, it applies all transformations to each element rather than applying each transformation to all elements. By applying all transformations in one step, the view is able to return the first two elements without evaluating the entire collection. Here, we see the potential performance gains from view usage due to lazy evaluation. Before applying the concept of views to the performance trend report, let's take a deeper look at view implementation.
Views are able to defer evaluation by returning a data structure that composes the previous transformation state with the next transformation. The Scala implementation of views is admittedly complicated to digest because it provides a large number of capabilities while retaining support for all Scala collections. To build an intuition for how views are implemented, let's construct our own lazily evaluated view that works only for List
and only supports map
operations. To begin, we define the operations that are supported by our implementation of a PseudoView
view:
sealed trait PseudoView[A] { def map[B](f: A => B): PseudoView[B] def toList: List[A] }
The PseudoView
is defined as a trait that supports lazy application of a transformation from A
to B
and also supports evaluating all transformations to return a List
. Next, we define two view types of view to support the initial case when zero transformations have been applied and to support applying a transformation to a previously transformed view. The signatures are shown in the following snippet:
final class InitialView[A](xs: List[A]) extends PseudoView[A] final class ComposedView[A, B](xs: List[A], fa: A => B) extends PseudoView[B]
In both scenarios, the original List
must be carried through to support eventually applying the transformations. In the InitialView
base case, there are zero transformations, which is why there is no additional state. ComposedView
supports chaining computations by carrying the state of the previous fa
transformation.
Implementing InitialView
is a straightforward delegation to ComposedView
:
final class InitialView[A](xs: List[A]) extends PseudoView[A] { def map[B](f: A => B): PseudoView[B] = new ComposedView[A, B](xs, f) def toList: List[A] = xs }
The List
implementation shows how transformations are chained together using function composition:
final class ComposedView[A, B](xs: List[A], fa: A => B) extends PseudoView[B] { def map[C](f: B => C): PseudoView[C] = new ComposedView(xs, f.compose(fa)) def toList: List[B] = xs.map(fa) }
Let's construct a PseudoView
companion object that provides view construction, as follows:
object PseudoView { def view[A, B](xs: List[A]): PseudoView[A] = new InitialView(xs) }
We can now exercise PseudoView
with a simple program to demonstrate that it defers evaluation:
println("PseudoView evaluation:") val listPseudoView = PseudoView.view(List(0, 1, 2)).map(i => { println(s"Adding one to $i") i + 1 }).map(i => { println(s"Multiplying $i") i * 2 }) println("--- Converting PseudoView to List ---") println(listPseudoView.toList)
Running this program, we see output equivalent to usage of Scala's view implementation:
PseudoView evaluation: --- Converting PseudoView to List --- Adding one to 0 Multiplying 1 Adding one to 1 Multiplying 2 Adding one to 2 Multiplying 3 List(2, 4, 6)
PseudoView
helps build an intuition about how Scala implements views. From here, you can begin considering how to support other operations. For example, how can filter
be implemented? The filter
is interesting to consider because it constrains the original collection. As defined, PseudoView
is ill-equipped to support the filter
operations, which is one illustration of the complexity that is handled by Scala views. Scala views tackles this challenge by defining a trait named Transformed
. The Transformed
trait is the base trait for all view operations. A partial definition is shown, as follows:
trait Transformed[+B] extends GenTraversableView[B, Coll] { def foreach[U](f: B => U): Unit lazy val underlying = self.underlying }
The underlying
lazy value is how the originally wrapped collection is accessed. This is analogous to how PseudoView
passed the List
state into ComposedView
. Transformed
defines a side-effecting foreach
operation to support collection operations in a lazy manner. Using foreach
allows implementations of this trait to modify the underlying collection. This is how filter
is implemented:
trait Filtered extends Transformed[A] { protected[this] val pred: A => Boolean def foreach[U](f: A => U) { for (x <- self) if (pred(x)) f(x) } }
Transformed
is used within the view API to maintain the state of necessary operations, while the external API supports interacting with SeqView
. Following another pattern that is commonly found in Scala collections, SeqView
inherits a number of operations by mixing in other traits. SeqView
indirectly mixes in TraversableViewLike
, which provides access to the Transformed
operations.
With our newly-developed intuition for views, we may view (no pun intended!) the construction of performance trend reports differently. Scala's implementation of views makes it trivial to switch from eagerly evaluated collections to a lazily evaluated version. If you recall, once the order ID to the average execution price lookup table is constructed, a series of transformations are applied to the orders that are retrieved for the duration and ticker. By converting orders
to a view, there is an opportunity to avoid unnecessary transformations and improve the speed of the performance trend report.
While it is trivial to convert to a view, it is less trivial to identify under which conditions lazy evaluation out-performs eager evaluation. As a good performance engineer, you want to benchmark your proposed change, but you do not have access to historical order and execution data to build a benchmark. Instead, you write a microbenchmark that simulates the problem that you are modeling. The question that you are trying to answer is, "For what size collection and what number of operations does it make sense to use a view over a List
?" There is a cost to constructing a view because it involves retaining information about the deferred transformation, which implies it will not always be the most performant solution. You come up with the following scenarios to help answer your question:
@Benchmark def singleTransformList(state: ViewState): List[Int] = state.numbers.map(_ * 2) @Benchmark def singleTransformView(state: ViewState): Vector[Int] = state.numbers.view.map(_ * 2).toVector @Benchmark def twoTransformsList(state: ViewState): List[Int] = state.numbers.map(_ * 2).filter(_ % 3 == 0) @Benchmark def twoTransformsView(state: ViewState): Vector[Int] = state.numbers.view.map(_ * 2).filter(_ % 3 == 0).toVector @Benchmark def threeTransformsList(state: ViewState): List[Int] = state.numbers.map(_ * 2).map(_ + 7).filter(_ % 3 == 0) @Benchmark def threeTransformsView(state: ViewState): Vector[Int] = state.numbers.view.map(_ * 2).map(_ + 7).filter(_ % 3 == 0).toVector
For each collection type, a List
, and a view over a Vector
, you define three tests that exercise an increasing number of transformations. Vector
is used instead of List
because toList
on a view is not specialized for List
. As we have previously seen, List
operations are written to take advantage of constant time and prepend performance. The toList
performs linear time append operations, which gives the false impression that views deliver lower performance. Switching to Vector
provides effectively constant time append operations. The state for this benchmark looks like the following:
@State(Scope.Benchmark) class ViewState { @Param(Array("10", "1000", "1000000")) var collectionSize: Int = 0 var numbers: List[Int] = Nil @Setup def setup(): Unit = { numbers = (for (i <- 1 to collectionSize) yield i).toList } }
ViewState
sweeps different collection sizes to help identify how sensitive view performance is to collection size. The benchmark is invoked via the following:
sbt 'project chapter5' 'jmh:run ViewBenchmarks -foe true'
This invocation produces the following results:
Benchmark |
Collection size |
Throughput (ops per second) |
Error as percentage of throughput |
|
10 |
15,171,067.61 |
± 2.46 |
|
10 |
3,175,242.06 |
± 1.37 |
|
1,000 |
133,818.44 |
± 1.58 |
|
1,000 |
52,688.80 |
± 1.11 |
|
1,000,000 |
30.40 |
± 2.72 |
|
1,000,000 |
86.54 |
± 1.17 |
|
10 |
5,008,830.88 |
± 1.12 |
|
10 |
4,564,726.04 |
± 1.05 |
|
1,000 |
44,252.83 |
± 1.08 |
|
1,000 |
80,674.76 |
± 1.12 |
|
1,000,000 |
22.85 |
± 3.78 |
|
1,000,000 |
77.59 |
± 1.46 |
|
10 |
3,360,399.58 |
± 1.11 |
|
10 |
3,438,977.91 |
± 1.27 |
|
1,000 |
36,226.87 |
± 1.65 |
|
1,000 |
58,981.24 |
± 1.80 |
|
1,000,000 |
10.33 |
± 3.58 |
|
1,000,000 |
49.01 |
± 1.36 |
The results give us an interesting insight into the cases where using a view yields better performance. For a small collection, such as 10 elements in our benchmark, a List
performs better, regardless of the amount of operations, although this gap closes at 1,000,000 elements. When transforming a large collection, 1,000,000 elements in our benchmark, a view is more efficient with an increasing differential as the number of transformations increases. For example, with 1,000,000 elements and two transformations, views deliver approximately triple the throughput of List
. In the case of a medium size collection, such as 1,000 elements in this example, this is not as clear-cut. When performing a single transformation, an eager List
performs better, while a view delivers better throughput when applying more than one transformation.
As the volume of your data and the transformation count increase, it becomes more likely that a view offers better performance. Here, you see the tangible benefit of avoiding intermediate collections. A second axis of performance to consider is the nature of the transformation. Transformations that benefit from early termination (for example, find
), benefit strongly from lazy evaluation. This benchmark illustrates that it is important to understand the size of your data and the transformations that you intend to perform.
Views offer a simple way to improve performance with minimally invasive changes to your system. The ease of use is part of the allure of views, which may tempt you to use them more frequently than you otherwise would. As our benchmarking in the previous section shows, there is a nontrivial overhead to using views, which means defaulting to views is a suboptimal choice. Looking past the pure performance perspective, there are other reasons to tread carefully when using views.
As views mirror the collection API, it can be a challenge to identify when transformations are being applied lazily. For this reason, we recommend setting well-defined boundaries for view usage. When working on client reporting, we limited view usage to a single inner-function and used a List
eager collection type as the return type. Minimizing the area of a system performing a lazy evaluation can reduce cognitive load when building a runtime execution mental model.
On a related note, we feel that it is important to be cautious about how a view is transformed into an eagerly evaluated collection type. We showed conversion by invoking toList
, which makes the intent explicit. SeqView
also provides a force
method to force evaluation. As a general rule, we avoid using force
because it typically returns scala.collection.immutable.Seq
. SeqView
retains the collection type as its second generic parameter, which allows force
to return the original collection type when there is enough evidence. However, certain operations, such as map
, cause the view to lose evidence of the original collection type. When this happens, force
returns the more general Seq
collection type. Seq
is a trait that is a super-type to all sequences in the collection library, including views and another lazy data structure that we will discuss later, named scala.collection.immutable.Stream
. This inheritance scheme allows the following three statements to compile:
val list: Seq[Int] = List(1, 2, 3) val view: Seq[Int] = list.view val stream: Seq[Int] = list.toStream
We believe this is undesirable because the Seq
data type hides critical information about the underlying implementation. It represents both lazy and eagerly evaluated collections with the same type. Consider the following snippet example to understand why this is undesirable:
def shouldGenerateOrder(xs: Seq[Execution]): Boolean = xs.size >= 3
In this manufactured example, imagine that shouldGenerateOrder
is invoked with a Vector
, but then later the Vector
is swapped out for SeqView
. With Vector
, identifying collection length is a constant time operation. With SeqView
, you cannot reason with certainty about the runtime of the operation, except to say that it is definitely more expensive than Vector.size
. Seq
usage, and, therefore, the usage of force
, should be avoided because it is difficult to reason about runtime behavior, and this can lead to unexpected side-effects.
In a typical software system, areas of responsibility are separated into discrete modules. Using the performance trend reporting example, you can imagine a separate module containing the translation from List[TradingPerformanceTrend]
to a PDF report. You may be tempted to expose the view to other modules to extend the benefit of lazy transformations. If benchmarks justify making this type of change, then we encourage you to choose one of these options. Our preferred choice in this scenario is to use Stream
, which is a lazily evaluated version of List
. We explore Stream
later in this chapter. Alternatively, if Stream
cannot be used, be strict in your use of the SeqView
datatype to clearly demarcate that the collection is lazily evaluated.
One additional consideration when using views is to be cognizant of when transformations are repeatedly applied. For example consider this manufactured example that focuses on a use case where a view is used as a base for multiple computations:
> val xs = List(1,2,3,4,5).view.map(x => { println(s"multiply $x"); x * 2 }) xs: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...) > val evens = xs.filter(_ % 2 == 0).toList multiply 1 multiply 2 multiply 3 multiply 4 multiply 5 evens: List[Int] = List(2, 4, 6, 8, 10) > val odds = xs.filter(_ % 2 != 0).toList multiply 1 multiply 2 multiply 3 multiply 4 multiply 5 odds: List[Int] = List()
In this example, xs
is a view on a list of integers. A map
transformation is lazily applied to multiply these integers by 2. The view is then used to create two List
instances, one containing even elements, the other containing odd elements. We observe that the transformation is applied to the view twice, each time we turn the view into a list. This shows that the transformation is lazily applied, but the results of the computation are not cached. This is a characteristic of views to keep in mind, as expensive transformations applied several times can cause significant slowdowns. This is also the reason why side-effects should be avoided in transformations applied to views. If, for some reason, referential transparency is not upheld, the combination of side-effects and multiple evaluations due to view usage can lead to exceptionally difficult to maintain software.
This example is straightforward, and the misuse of views is easy to spot. However, even methods that are provided by the standard library can lead to undesirable results when used with views. Consider this snippet:
> val (evens, odds) = List(1,2,3,4,5).view.map(x => { println(s"multiply $x"); x * 2 }).partition(_ % 2 == 0) evens: scala.collection.SeqView[Int,Seq[_]] = SeqViewMF(...) odds: scala.collection.SeqView[Int,Seq[_]] = SeqViewMF(...) > println(evens.toList, odds.toList) multiply 1 multiply 2 multiply 3 multiply 4 multiply 5 multiply 1 multiply 2 multiply 3 multiply 4 multiply 5 (List(2, 4, 6, 8, 10),List())
This example achieves the same results as the previous sample, but we rely on the built-in partition
method to split the original list into two distinct collections each operating on the original view. Again, we see the map
transformation applied twice to the original view. This is due to the underlying implementation of partition
in TraversableViewLike
. The main takeaway is that views and lazy evaluation can help yield better performance, but they should be used carefully. It is a good idea to experiment and try your algorithm in the REPL to confirm that you are using views correctly.
In our running example on reporting on trading performance trends, we saw an easy-to-miss example of lazy evaluation when operating on a Map
. Recall that there was a lookup table built using the following code:
executions.groupBy(_.id).mapValues(es => Price.average(es.map(_.price)))
The return type of mapValues
is Map[A, B]
, which does not suggest any difference in evaluation strategy. Let's run a simple example in the REPL:
> val m = Map("a" -> 1, "b" -> 2) m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2) > val m_prime = m.mapValues{ v => println(s"Mapping $v"); v * 2} Mapping 1 Mapping 2 m_prime: scala.collection.immutable.Map[String,Int] = Map(a -> 2, b -> 4) > m_prime.get("a") Mapping 1 res0: Option[Int] = Some(2) > m_prime.get("a") Mapping 1 res1: Option[Int] = Some(2)
Notice how, each time we call get
on m_prime
to retrieve a value, we can observe the transformation being applied, even when using the same key. The mapValues
is a lazily-evaluated transformation of each value in the map akin to a view operating on the keys of a map. The types that are involved do not provide any insight, and unless you inspect the implementation of Map
or carefully read the documentation that is associated with mapValues
, you will likely miss this important detail. Consider the caveats of views when working with mapValues
.
While investigating the implementation of TradingPerformanceTrend
, we took a deep dive into views and found how they can improve performance. We now return to the implementation of trend
to complete the generation of the List[radingPerformanceTrend]
. The following snippet shows trend
with the implementation of periodPnL
hidden because we thoroughly reviewed it:
def trend( now: () => Instant, findOrders: (Duration, Ticker) => List[Order], findExecutions: (Duration, Ticker) => List[Execution], request: GenerateTradingPerformanceTrend): List[TradingPerformanceTrend] = { def periodPnL( start: Instant => Instant): Map[Ticker, PeriodPnL] = { ... } val tickerToLastHour = periodPnL(now => now.minus(Period.hours(1).getMillis)).mapValues { case PeriodPositive => LastHourPositive case PeriodNegative => LastHourNegative } val tickerToLastDay = periodPnL(now => now.minus(Period.days(1).getMillis)).mapValues { case PeriodPositive => LastDayPositive case PeriodNegative => LastDayNegative } val tickerToLastSevenDays = periodPnL(now => now.minus(Period.days(7).getMillis)).mapValues { case PeriodPositive => LastSevenDayPositive case PeriodNegative => LastSevenDayNegative } tickerToLastHour.zip(tickerToLastDay).zip(tickerToLastSevenDays).map({ case (((t, lastHour), (_, lastDay)), (_, lastSevenDays)) => TradingPerformanceTrend(t, lastHour, lastDay, lastSevenDays) }).toList }
This method focuses on marshaling the translation of PnL for a time period to the appropriate time period's performance trend. The final expression involving two invocations of zip
makes the transformation from three maps with keys of Ticker
and corresponding period PnL trend values to List[TradingPerformanceTrend]
elegant. zip
iterates over two collections to yield a tuple for each index of both collections. Here is a simple snippet to illustrate zip
usage:
println(List(1, 3, 5, 7).zip(List(2, 4, 6)))
This yields the following:
List((1,2), (3,4), (5,6))
The result is that corresponding indexes are "zipped" together. For example, at index one, the first list's value is three and the second list's value is four, yielding the tuple, (3, 4)
. The first list has four elements while the second list only has three elements; this is silently omitted from the resulting collection. This behavior is well-documented, but it might be unexpected at first glance. In our reporting use case, we are certain that each key (that is, each Ticker
), appears in all three maps. In this use case, we are certain that all three maps are of equal length.
However, there is a subtle bug in our usage of zip
. The zip
uses a collection's iterator to iterate over elements, which implies that usage of zip
is sensitive to ordering. Each of the three maps is constructed by invoking toMap
, which indirectly delegates to a scala.collection.immutable.HashMap
implementation of Map
. Similar to Set
, Scala provides several handwritten implementations of Map
(for example, Map2
) for small collection sizes before constructing a HashMap
. By now, you may realize the flaw, HashMap
does not guarantee ordering.
To fix this bug and retain usage of zip
, we can leverage our earlier discovery of SortedMap
, the trait backed by TreeMap
with sorted keys. Swapping out Map
for SortedMap
and making appropriate changes to define an Ordering
for Ticker
, we now have a bug-free, elegant solution to generating trading performance trend reports. With a judicious usage of views, we found a way to deliver iterative performance improvements with minimally invasive changes. This will give the sales team something to ring the bell about! This gives us additional time to consider other approaches to generating reports.