Memoization

The next two relatively advanced topics I will cover somehow resemble the Just  in  Time approach taken outside of the compilation context. With Just in Time (https://en.wikipedia.org/wiki/Just_in_Time), Wikipedia comes up first with a production strategy in manufacturing, where components are delivered immediately before being utilized as a way of being lean on inventory costs.

As a matter of fact, memoization and lazy evaluation complement each other in this lean calculation sense. While laziness allows you not to perform calculations until the result is absolutely required, memoization makes the results of the already performed fat resource expensive calculations reusable by not allowing them to be wasted.

I have already used memoization somewhat when implementing prime number generation earlier in this chapter for covering mutual recursion. An expensively generated sequence was cached there in order to use the already generated elements to find the next ones, which are not generated yet. Now, I want to concentrate on memoization in general, allowing any function to be memoized. Prior to doing this, it is important that you realize the following:

  • Memoization may work for pure functions only. This is almost obvious; if a function is not referentially transparent, it cannot be memoized as memoization captures solely arguments, not arguments, and state
  • Memoization exploits a precalculated state

With this in mind, let's mimic implementations presented elsewhere on the Internet (https://blogs.msdn.microsoft.com/dsyme/2007/05/31/a-sample-of-the-memoization-pattern-in-f/ and http://www.fssnip.net/8P) in order to investigate related limitations and gotchas as shown here (Ch7_4.fsx):

// Memoization (F# 4.0 is required) 
let memoize f = 
  let mutable cache = Map.empty 
  fun x -> 
    match cache.TryFind(x) with 
    | Some res -> printfn "returned memoized";res 
    | None -> let res = f x in 
    cache <- cache.Add(x,res) 
    printfn "memoized, then returned"; res 

The type inferred for memoization by the F# compiler is shown here:

memoize : f:('a -> 'b) -> ('a -> 'b) when 'a : comparison 

Here, f represents a function to be memoized, cache serves as a state repository using the immutable Map F# collection (https://msdn.microsoft.com/en-us/library/ee353880.aspx) under the hood. memoize itself represents a full-fledged high-order function that takes a function as an argument and also returns a function. This closes over mutable cache (an F# 4.0 feature) and does the following:

  • If its argument x, which is used as a key against the closed Mapcache can be found, then it logs the indication that the precached value is to be used and returns this res value.
  • Otherwise, it mutates the closed cache to a new Map that has, in addition to the existed entries, the entry represented by the newly calculated tuple (x, f(x)), then it logs the fact that memoization took place and returns f(x).

Let's see for now how this works in FSI, which the following screenshot captures:

Memoization

Memoization with F# Map

First, I memoized the funx -> x*x function, which is supposed to represent a "fat" resource hungry calculation into the fm:(int -> int) function. Then, I used fm a couple of times with different arguments as shown here:

  • fm 10: The result 100 was memoized for argument 10 and then returned
  • fm 42: The result 1764 was also memoized and then returned
  • fm 10: As this argument value has already occurred, the result 100 is returned without any recalculation

This pattern seems quite straightforward; however, it carries a few gotchas.

For example, the signature of memoize indicates that 'a is required in order to represent comparison; what gives? Digging down the memoize implementation allows you to conclude that this constraint is a mere corollary of using F# Map to back the state persistence.

As the implementation behind Map is likely to be a balanced tree, it requires its keys to be comparable for rebalancing. Oops! Sounds like a leaky abstraction (https://en.wikipedia.org/wiki/Leaky_abstraction) takes place here. Also, this may be a limiting factor for the application of memoization generically, as comparability is not a universal property of a generic type 'a.

Let's change the persistence implementation mechanism to a generic Dictionary (https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.100).aspx), as shown in the following code (Ch7_4.fsx):

let memoize' f = 
  let cache = System.Collections.Generic.Dictionary() 
  fun x -> 
    match cache.TryGetValue(x) with 
    | true,res -> printfn "returned memoized";res 
    | _ -> let res = f x 
    cache.Add(x,res) 
    printfn "memoized, then returned" 
    res 

This changes the memoized argument constraint from comparison to equality as shown here:

memoize' : f:('a -> 'b) -> ('a -> 'b) when 'a : equality 

This can be considered more universal, only until some innocuous usage like this occurs:

let disaster = memoize' (fun () -> 5) 
...... 
disaster() 

Executing this code will end up with the following exception:

System.ArgumentNullException: Value cannot be null 

What the heck? Turns out nothing special happened, just another leaky abstraction has taken place, and consequently, the gotcha occurred. This time, the gotcha stems from the underlying persistence mechanism that does not allow you to have the null value as a Dictionary key (on the contrary, Map happily allows this)!

Finally, I want to touch the matter of combining memoization with recursion, as quite frequently, recursion is the tool to solve problems with the divide and conquer strategy, where memoization fits naturally and may really shine.

Let's take some use case more appropriate for the purpose, for example, the simple calculation of binomial coefficients with the help of Pascal's triangle (https://en.wikipedia.org/wiki/Pascal%27s_triangle) as shown here (Ch7_4.fsx):

let rec binomial n k =  
  if k = 0 || k = n then 1 
  else 
    binomial (n - 1) k + binomial (n - 1) (k - 1) 

The easily noticeable recurrent relationship between elements of Pascal's triangle rows allows you to expect a serious benefit from memoization.

The memoized implementation of binomial is also straightforward; memoize is turned into an internal function in order to strip the logging introduced within its initial version. The only other problem left is that the memoized function has one argument. However, applying uncurrying helps with this trouble nicely as shown here (Ch7_4.fsx):

let rec memoizedBinomial = 
  let memoize f = 
    let cache = System.Collections.Generic.Dictionary() 
    fun x -> 
    match cache.TryGetValue(x) with 
    | true,res -> res 
    | _ -> let res = f x 
    cache.Add(x,res) 
    res 
  memoize 
  (fun (n,k) -> 
    if k = 0 || k = n then 1 
    else 
      memoizedBinomial (n - 1, k) + 
      memoizedBinomial (n - 1, k - 1)) 

Now it's time to measure the gain achieved from the memoization. The code in the upcoming figure measures the duration of repeating binomial 500 2 10,000 times compared to the duration of repeating it 10,000 times as memoizedBinomial (500,2):

Memoization

Memoizing of the "divide and conquer" solution

The results of the comparison are absolutely stunning, that is, 23781 / 15 = 1585, which means that memoization has improved the performance by 1585 times!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset