In the previous recipe, Partitioning Monte Carlo simulations for better pmap performance, we more or less guessed what will make a good partition size. We tried a few different values and saw what gives us the best results. However, it's still largely guesswork since just making the partitions larger or smaller doesn't give consistently better or worse results.
This is the type of task that computers are good at. Namely, searching a complex space to find the function parameters that result in an optimal output value. For this recipe, we'll use a fairly simple optimization algorithm called simulated annealing. Similar to many optimization algorithms, this is based on a natural process: the way molecules settle into low-energy configurations as the temperature drops to freezing. This is what allows water to form efficient crystal lattices as it freezes.
In simulated annealing, we feed a state to a cost function. At each point, we evaluate a random neighboring state and possibly move to it. As the energy in the system (the temperature) goes down, we are less likely to jump to a new state, especially if that state is worse than the current one according to the cost function. Finally, after either reaching a target output or iterating through a set number of steps, we take the best match found. Similar to many optimization algorithms, this doesn't guarantee that the result will be the absolute best match, but it should be a good one.
For this recipe, we'll use the Monte Carlo pi approximation function that we did in the Partitioning Monte Carlo simulations for better pmap performance recipe, and we'll use simulated annealing to find a better partition size.
We need to use the same dependencies, uses, imports, and functions as we did in the Partitioning Monte Carlo simulations for better pmap performance recipe. In addition, we'll also need the mc-pi-part
function from that recipe.
annealing
function that takes all of the function parameters for the process as arguments (we'll discuss them in more detail in a minute):(defn annealing [initial max-iter max-cost neighbor-fn cost-fn p-fn temp-fn] (let [get-cost (memoize cost-fn) cost (get-cost initial)] (loop [state initial cost cost k 1 best-seq [{:state state, :cost cost}]] (println '>>> 'sa k . state $ cost) (if (and (< k max-iter) (or (nil? max-cost) (> cost max-cost))) (let [t (temp-fn (/ k max-iter)) next-state (neighbor-fn state) next-cost (get-cost next-state) next-place {:state next-state, :cost next-cost}] (if (> (p-fn cost next-cost t) (rand)) (recur next-state next-cost (inc k) (conj best-seq next-place)) (recur state cost (inc k) best-seq))) best-seq))))
annealing
takes an initial state, a limit to the number of iterations, a target output value, and a series of functions. The first function takes the current state and returns a new neighboring state.To write this function, we have to decide how best to handle the state for this problem. Namely, since the function to be evaluated has multiple parameters, we will use a vector and randomly slide one value in that around. However, for this problem, we only have one input value: the partition size.
So, for this problem, we'll use an integer between 0 and 20 instead. The actual partition size will be 2 raised to that power. To find a neighbor, we just randomly slide the state's value up or down at most by five units, within the range of 0 to 20:
(defn get-neighbor [state] (max 0 (min 20 (+ state (- (rand-int 11) 5)))))
annealing
is the cost
function. This will take the state and return the value that we're trying to minimize. In this case, we benchmark mc-pi-part
with the given partition size (2 raised to the power) and return the average time:(defn get-pi-cost [n state] (let [chunk-size (long (Math/pow 2 state))] (first (:mean (quick-benchmark (mc-pi-part chunk-size n) {})))))
(defn should-move [c0 c1 t] (* t (if (< c0 c1) 0.25 1.0)))
(defn get-temp [r] (- 1.0 (float r)))
That's it. We can let this find a good partition size. We'll start with the value that we used in the Partitioning Monte Carlo simulations for better pmap performance recipe. We'll only allow 10 iterations since the search space is relatively small:
user=> (annealing 12 10 nil get-neighbor (partial get-pi-cost 1000000) should-move get-temp) >>> sa 1 . 12 $ 0.5805938333333334 >>> sa 2 . 8 $ 0.38975950000000004 >>> sa 3 . 8 $ 0.38975950000000004 >>> sa 4 . 8 $ 0.38975950000000004 >>> sa 5 . 8 $ 0.38975950000000004 >>> sa 6 . 8 $ 0.38975950000000004 >>> sa 7 . 6 $ 0.357514 >>> sa 8 . 6 $ 0.357514 >>> sa 9 . 6 $ 0.357514 >>> sa 10 . 6 $ 0.357514 [{:state 12, :cost 0.5805938333333334} {:state 8, :cost 0.38975950000000004} {:state 6, :cost 0.357514}]
We can see that a partition size of 64 (26) is the best time, and rerunning the benchmarks verifies this.
In practice, this algorithm won't help if we run it over the full input data. However, if we can get a large enough sample, this can help us process the full dataset more efficiently by taking a lot of the guesswork out of picking the right partition size for the full evaluation.
annealing
function apart to see how it works. The process is handled by a loop
inside the annealing
function. Its parameters are a snapshot of the state of the annealing process:(loop [state initial cost cost k 1 best-seq [{:state state, :cost cost}]]
(if (and (< k max-iter) (or (nil? max-cost) (> cost max-cost)))
(let [t (temp-fn (/ k max-iter)) next-state (neighbor-fn state) next-cost (get-cost-cache next-state) next-place {:state next-state, :cost next-cost}]
should-move
) indicates so, we move to the next state and loop. Otherwise, we stay at the current state and loop:(if (> (p-fn cost next-cost t) (rand)) (recur next-state next-cost (inc k) (conj best-seq next-place)) (recur state cost (inc k) best-seq)))
best-seq)))))
This provides a systematic way to explore the problem area. In this case, it was to find a better partition size for this problem.
Simulated annealing is one of a class of algorithms known as metaheuristic. This is a class of algorithms that are widely applicable to search a result space but that do not guarantee a globally optimal solution. Instead, they often approach the problem stochastically, randomly searching for the best solution that they can find within a reasonable amount of time. All of these take a function (the cost-fn
function mentioned previously) and try to find the largest or smallest value for it.
Other optimization algorithms include genetic algorithms, ant colony optimization, particle swarm optimization, and many others. This is a broad and interesting field, and being familiar with these algorithms can be helpful for anyone who does data analysis.