The next two relatively advanced topics I will cover somehow resemble the Just in Time approach taken outside of the compilation context. With Just in Time (https://en.wikipedia.org/wiki/Just_in_Time), Wikipedia comes up first with a production strategy in manufacturing, where components are delivered immediately before being utilized as a way of being lean on inventory costs.
As a matter of fact, memoization and lazy evaluation complement each other in this lean calculation sense. While laziness allows you not to perform calculations until the result is absolutely required, memoization makes the results of the already performed fat resource expensive calculations reusable by not allowing them to be wasted.
I have already used memoization somewhat when implementing prime number generation earlier in this chapter for covering mutual recursion. An expensively generated sequence was cached there in order to use the already generated elements to find the next ones, which are not generated yet. Now, I want to concentrate on memoization in general, allowing any function to be memoized. Prior to doing this, it is important that you realize the following:
With this in mind, let's mimic implementations presented elsewhere on the Internet (https://blogs.msdn.microsoft.com/dsyme/2007/05/31/a-sample-of-the-memoization-pattern-in-f/ and http://www.fssnip.net/8P) in order to investigate related limitations and gotchas as shown here (Ch7_4.fsx
):
// Memoization (F# 4.0 is required) let memoize f = let mutable cache = Map.empty fun x -> match cache.TryFind(x) with | Some res -> printfn "returned memoized";res | None -> let res = f x in cache <- cache.Add(x,res) printfn "memoized, then returned"; res
The type inferred for memoization by the F# compiler is shown here:
memoize : f:('a -> 'b) -> ('a -> 'b) when 'a : comparison
Here, f
represents a function to be memoized, cache
serves as a state repository using the immutable Map F# collection (https://msdn.microsoft.com/en-us/library/ee353880.aspx) under the hood. memoize
itself represents a full-fledged high-order function that takes a function as an argument and also returns a function. This closes over mutable cache
(an F# 4.0 feature) and does the following:
x
, which is used as a key against the closed Map
cache
can be found, then it logs the indication that the precached value is to be used and returns this res
value.cache
to a new Map
that has, in addition to the existed entries, the entry represented by the newly calculated tuple (x, f(x)
), then it logs the fact that memoization took place and returns f(x)
.Let's see for now how this works in FSI, which the following screenshot captures:
First, I memoized the fun
x -> x*x
function, which is supposed to represent a "fat" resource hungry calculation into the fm:(int -> int)
function. Then, I used fm
a couple of times with different arguments as shown here:
fm 10
: The result 100
was memoized for argument 10 and then returnedfm 42
: The result 1764
was also memoized and then returnedfm 10
: As this argument value has already occurred, the result 100
is returned without any recalculationThis pattern seems quite straightforward; however, it carries a few gotchas.
For example, the signature of memoize
indicates that 'a
is required in order to represent comparison; what gives? Digging down the memoize
implementation allows you to conclude that this constraint is a mere corollary of using F# Map
to back the state persistence.
As the implementation behind Map
is likely to be a balanced tree, it requires its keys to be comparable for rebalancing. Oops! Sounds like a leaky abstraction (https://en.wikipedia.org/wiki/Leaky_abstraction) takes place here. Also, this may be a limiting factor for the application of memoization generically, as comparability is not a universal property of a generic type 'a
.
Let's change the persistence implementation mechanism to a generic Dictionary (https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.100).aspx), as shown in the following code (Ch7_4.fsx
):
let memoize' f = let cache = System.Collections.Generic.Dictionary() fun x -> match cache.TryGetValue(x) with | true,res -> printfn "returned memoized";res | _ -> let res = f x cache.Add(x,res) printfn "memoized, then returned" res
This changes the memoized argument constraint from comparison to equality as shown here:
memoize' : f:('a -> 'b) -> ('a -> 'b) when 'a : equality
This can be considered more universal, only until some innocuous usage like this occurs:
let disaster = memoize' (fun () -> 5) ...... disaster()
Executing this code will end up with the following exception:
System.ArgumentNullException: Value cannot be null
What the heck? Turns out nothing special happened, just another leaky abstraction has taken place, and consequently, the gotcha occurred. This time, the gotcha stems from the underlying persistence mechanism that does not allow you to have the null
value as a Dictionary
key (on the contrary, Map
happily allows this)!
Finally, I want to touch the matter of combining memoization with recursion, as quite frequently, recursion is the tool to solve problems with the divide and conquer strategy, where memoization fits naturally and may really shine.
Let's take some use case more appropriate for the purpose, for example, the simple calculation of binomial coefficients with the help of Pascal's triangle (https://en.wikipedia.org/wiki/Pascal%27s_triangle) as shown here (Ch7_4.fsx
):
let rec binomial n k = if k = 0 || k = n then 1 else binomial (n - 1) k + binomial (n - 1) (k - 1)
The easily noticeable recurrent relationship between elements of Pascal's triangle rows allows you to expect a serious benefit from memoization.
The memoized implementation of binomial
is also straightforward; memoize
is turned into an internal function in order to strip the logging introduced within its initial version. The only other problem left is that the memoized function has one argument. However, applying uncurrying helps with this trouble nicely as shown here (Ch7_4.fsx
):
let rec memoizedBinomial = let memoize f = let cache = System.Collections.Generic.Dictionary() fun x -> match cache.TryGetValue(x) with | true,res -> res | _ -> let res = f x cache.Add(x,res) res memoize (fun (n,k) -> if k = 0 || k = n then 1 else memoizedBinomial (n - 1, k) + memoizedBinomial (n - 1, k - 1))
Now it's time to measure the gain achieved from the memoization. The code in the upcoming figure measures the duration of repeating binomial 500 2
10,000 times compared to the duration of repeating it 10,000 times as memoizedBinomial (500,2)
:
The results of the comparison are absolutely stunning, that is, 23781 / 15 = 1585
, which means that memoization has improved the performance by 1585 times!