Improving Performance with Parallelization

One of the strengths of genetic algorithms is their ability to be parallelized. Parallelism is when processes execute simultaneously on multiple cores, systems, and the like. The tasks that make up an evolution can be performed in parallel to yield significant performance gains. The BEAM is optimized for orchestrating computations in parallel. You can take advantage of this fact to improve the performance of your genetic algorithms.

The benefit of writing genetic algorithms in Elixir is the availability of a rich set of parallelization features. Parallelizing algorithms in Elixir is straightforward thanks to modules like Task, Agent, and GenServer. You can parallelize your program in just a few lines of code, and let the Erlang Scheduler handle the orchestration of computation for you. The Erlang Scheduler is responsible for orchestrating the processes you spawn during the life cycle of your Elixir applications. You can think of a scheduler like a parent who forces siblings to share toys. Each sibling (process) gets some time (runtime) to play with the toy (computing power). To learn more about how the Erlang Scheduler works, check out The BEAM Book.[19]

You can achieve parallelization in your genetic algorithms on the BEAM in many ways. In this section, you’ll explore two: using the Task module and using Agents. However, before you begin, it’s important to understand the difference between concurrency and parallelism, and how parallelism in Elixir works.

Concurrency Versus Parallelism

The distinction between concurrency and parallelism is important when determining how best to speed up your genetic algorithms. Erlang is built to be a concurrent language. Processes are concurrent when they appear to be executing simultaneously but they’re actually executing on the same thread. The scheduler achieves concurrency by alternating between processes until all processes are complete.

You can achieve true parallelism on the BEAM if you have a multi-core or distributed system. The BEAM will automatically parallelize your processes using symmetric multiprocessing (SMP). You don’t need to understand what SMP is or how it works. If your machine supports SMP and has multiple cores, the scheduler handles the parallelization for you.

To check if you have SMP enabled, open iex and type:

 iex(1)> ​:erlang​.system_info ​:smp_support
 true

If it returns true, your system is capable of parallel execution. To see how many schedulers your system will use, type the following:

 iex(2)> ​:erlang​.system_info ​:schedulers_online
 12

The number of schedulers indicates the number of processes that will run in parallel on your machine. Most modern machines will have at least four cores and thus four schedulers. Just because you have four cores, however, doesn’t mean that the BEAM will bind one scheduler per core or thread. Instead, the BEAM will leave it up to your operating system to decide how to allocate schedulers.

One thing to remember is that parallelism doesn’t always lead to better performance. Some code just isn’t parallelizable. Sometimes the overhead associated with parallelism is too expensive for simple tasks. You need to consider these things when determining which parts of your code will benefit the most from parallelism.

An additional consideration is that there’s a point of diminishing returns when it comes to parallelism. In most languages, creating threads is expensive and has a certain amount of overhead. The overhead makes it such that creating too many threads will decrease performance rather than increase. Additionally, on small tasks, parallelism would be unnecessary because of the associated overhead.

On the BEAM, spawning processes is an inexpensive process; however, you still need to be conscious of how attempts at parallelism will impact the performance of your algorithms. If you’re processing relatively small amounts of data, it doesn’t usually make sense to parallelize because the associated overhead of parallelism will decrease your performance. You should turn to parallelism when dealing with larger amounts of data that takes a long time to process.

Using the Task Module

The easiest way to parallelize in Elixir is using the Task module. The Task module offers convenience functions for working with tasks. In Elixir, a task is a process meant to perform a single action throughout its lifetime. You can use the Task module to execute code concurrently or in parallel.

In genetic.ex, create a new function pmap/2 like this:

 def​ pmap(collection, func) ​do
  collection
  |> Enum.map(&Task.async(func.(&1)))
  |> Enum.map(&Task.await(&1))
 end

This function comes from Programming Elixir [Tho14]. It implements a parallel map function by attaching every element in a collection to a process. The processes will then all execute in parallel. Task.await/2 will return the result of every process.

You can use pmap/2 to replace the traditional Enum.map/2 calls in your code; however, it doesn’t always make sense to do this. To understand why, you’ll look at some basic examples.

Create a new file pmap.exs in scripts. At the top of the file, create two anonymous functions expensive and inexpensive, like this:

 expensive =
 fn​ x ->
  x = x * x
 :timer​.sleep(500)
  x
 end
 inexpensive = ​fn​ x -> x * x ​end

expensive simulates a computationally expensive function using :timer.sleep/1. inexpensive is a computationally inexpensive function.

Next, create some dummy data with 100 elements, like this:

 data = for x <- 1..100, ​do​: x

Finally, benchmark pmap/2 and Enum.map/2 with both inexpensive and expensive functions:

 Benchee.run(%{
 "​​pmap, expensive"​ => ​fn​ -> Genetic.pmap(data, &(expensive.(&1))) ​end​,
 "​​pmap, inexpensive"​ => ​fn​ -> Genetic.pmap(data, &(inexpensive.(&1))) ​end​,
 "​​map, expensive"​ => ​fn​ -> Enum.map(data, &(expensive.(&1))) ​end​,
 "​​map, inexpensive"​ => ​fn​ -> Enum.map(data, &(inexpensive.(&1))) ​end
 }, ​memory_time:​ 7)

You’ll benchmark both performance and memory usage to get a clear picture of what’s happening. Now, run pmap.exs:

 $ ​​mix​​ ​​run​​ ​​scripts/pmap.exs
 ...
 
 Comparison:
 map, inexpensive 423.09 K
 pmap, inexpensive 3.13 K - 134.98x slower +316.68 μs
 pmap, expensive 0.00200 K - 212074.08x slower +501245.72 μs
 map, expensive 0.00002 K - 21196897.26x slower +50099963.86 μs
 
 ...
 
 Comparison:
 map, inexpensive 1.62 KB
 pmap, inexpensive 51.78 KB - 32.02x memory usage +50.17 KB
 pmap, expensive 52.46 KB - 32.44x memory usage +50.85 KB
 map, expensive 1.62 KB - 1.00x memory usage +0 KB

What you’ll notice here is that pmap/2 runs significantly faster on the computationally expensive function and significantly slower on the inexpensive function. Why is this? It’s all about overhead.

Notice the memory usage of pmap/2 is much higher than map/2. That’s because you create a new process for every element in your list. Every process on the BEAM comes with its own stack, heap, message area, and process control block. You don’t need to know what each of these areas is for, but you should understand that process creation comes with some associated memory overhead.

With the inexpensive function, the creation of new processes is more expensive than the work being parallelized. Using pmap/2 on an inexpensive function is overkill. With the expensive function, pmap/2 is significantly faster because it can execute all of the expensive work in parallel.

In your genetic algorithm framework, it makes the most sense to replace Enum.map/2 in areas where computation is most expensive. For example, you’ll probably see speedups using pmap/2 over Enum.map/2 in evaluate, crossover, and mutation if you’re using expensive fitness functions and crossover or mutation strategies. If you aren’t using expensive fitness functions or strategies, you’ll likely end up hurting rather than helping performance.

Treating Chromosomes as Processes

The strongest aspect of the BEAM is its ability to create and work with processes. You saw how easy it was to spin up new processes using the Task module in the last section. In this section, you’ll use the Agent module to accomplish parallelization in a different manner.

An Agent is an abstraction around state. Agents allow you to keep track of state between entities. In your original genetic algorithm framework, you perform transformations on Chromosome structs. Using an Agent, you can replace these transformations with interactions between processes. These processes will naturally run in parallel and your algorithms will achieve parallelism naturally.

Note that you probably won’t see significant performance increases using this approach on normal machines. Using this approach in practice requires a massively parallel system to be viable. It also requires some additional complexities to implement correctly. In this section, you’ll implement a basic agent and see how you can evaluate a population of agents.

Start by opening chromosome.ex in types. At the top of the file, add the following line:

 use​ Agent

This line tells Elixir that you’ll be implementing the Agent behaviour.

Next, you need to implement some functions for interacting with the Agent. Add the following to chromosome.ex:

 def​ start_link(genes) ​do
  Agent.start_link(​fn​ -> %Chromosome{​genes:​ genes, ​size:​ Enum.count(genes)})
 end
 
 def​ get_fitness(pid) ​do
  Agent.get(pid, & &1.fitness)
 end
 
 def​ eval(pid, fitness) ​do
  c = Agent.get(pid, & &1)
  Agent.update(pid, ​fn​ -> %Chromosome{c | ​fitness:​ fitness.(c)})
 end

Here, you implement three functions start_link/1, get_fitness/1, and eval/2. start_link/1 creates a new Chromosome given some genes. get_fitness/1 returns fitness for use in evaluate/1. eval/2 evaluates a Chromosome given a fitness function. You’ll use this function to evaluate chromosomes in parallel.

Now you need to tweak how you implement initialize/1 and evaluate/2. In genetic.ex, change both functions to match this:

 def​ initialize(genotype, opts \ []) ​do
  population_size = Keyword.get(opts, ​:population_size​, 100)
  for _ <- 1..100, ​do​: Chromosome.start_link(​genes:​ genotype.())
 end
 
 def​ evaluate(population, fitness_function, _opts \ []) ​do
  population
  |> Enum.map(&Task.async(​fn​ -> Chromosome.eval(&1, fitness_function)))
  |> Enum.sort_by(​fn​ c -> Chromosome.get_fitness(c) ​end​, &>=/2)
 end

You’ll notice a few changes here. First, in initialize/0 you replace the declaration of a new Chromosome struct with a call to start_link/1. Next, in evaluate/1 you use both eval/2 and get_fitness/1 to first evaluate every chromosome process and then sort them. Note, for simplicity, in this example evaluate doesn’t update the age of a chromosome.

Now, when you run evaluate on an expensive fitness function, they’ll execute in parallel as messages sent to each Agent process.

This approach follows similar approaches taken in evolutionary-based algorithms written in Erlang, but it’s not necessary for the situations you will encounter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset