Sequences are, without doubt, extremely powerful members of the functional programmer tool belt. However, they are not free of gotchas that may hurt the performance badly. It is best that they are known and avoided. A few of them are as follows:
F# language creators were nice enough to provide an out-of-the-box tool for caching, namely the Seq.cache
library function. It should be used in a situation where lazy materialization is not a killer, but element generation is not cheap and repetitive enumerations are really required. Let me demonstrate how easy putting caching to work would be.
To begin with, I need an indicator of the enumerator consumption. This is not complicated for those who already have experience working with sequence guts. Let's slightly modify our good old makeSeq
function as following (Ch6_5.fsx
):
let makeSeq f = { new System.Collections.Generic.IEnumerable<'U> with member x.GetEnumerator() = printfn "Fresh enumerator given"; f() interface System.Collections.IEnumerable with member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }
Now we are ready to see how the caching works as shown here (Ch6_5.fsx
):
//caching let nums = (seq {1..100}).GetEnumerator |> makeSeq // non-cached - double enumeration ((nums |> Seq.sum),(nums |> Seq.length)) //Fresh enumerator given //Fresh enumerator given //val it : int * int = (5050, 100) let cache = nums |> Seq.cache // cached - single enumeration ((cache |> Seq.sum),(cache |> Seq.length)) //Fresh enumerator given //val it : int * int = (5050, 100) // just another time - no enumerations at all ((cache |> Seq.sum),(cache |> Seq.length)) //val it : int * int = (5050, 100)
First, in the absence of any caching, Seq.sum
and Seq.length
each imposed an independent sequence traversal, which the presence of two enumerator alerts confirms.
Then, after wrapping the working sequence with Seq.cache
, I repeat the calculation using the wrapper sequence. As expected, we notice only a single enumerator alert to populate the cache; the second traversal did not leave any traces as it went through the cache.
To be sure, just reissue the calculation. Now, all the data come from the cache, and no traversals of the original sequence take place at all.
I want to wrap up this chapter by demonstrating a pattern known as fusion. It is not conceptually difficult: imagine that you have a composition of functions that collectively transform a data sequence. At some point, your implementation requires multiple traversals of the sequence. However, the compiler in principle, or the human in practice, may optimize the transformation, so multiple traversals have now fused into just the single one.
Let's perform fusion in practice, reusing our makeSeq
implementation as the indicator of obtaining enumerators as shown in the following code (Ch6_5.fsx
):
let series = (seq {1..100}).GetEnumerator |> makeSeq let average dd = (Seq.sum dd) / (Seq.length dd) average series //Fresh enumerator given //Fresh enumerator given //val it : int = 50
The preceding naive implementation of average
traverses the sequence twice, of which the enumeration alerts give the evidence.
However, rewriting the implementation of average
less naively as averageFused
ends up in the fusion of these traversals as shown in the following code (Ch6_5.fsx
):
let averageFused dd = dd |> Seq.fold (fun acc x -> (fst acc + x, snd acc + 1)) (0,0) |> fun x -> fst x / snd x averageFused series //Fresh enumerator given //val it : int = 50
The single enumeration alert confirms my statement completely.