Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

12. Evolutionary Strategies for RL

Overview

In this chapter, we will be identifying the limitations of gradient-based methods and the motivation for evolutionary strategies. We will break down the components of genetic algorithms and implement them in Reinforcement Learning (RL). By the end of this chapter, you will be able to combine evolutionary strategies with traditional machine learning methods, specifically in the selection of neural network hyperparameters, and also identify the limitations of these evolutionary methods.

Introduction

In the previous chapter, we looked at various policy-based methods and their advantages. In this chapter, we are going to learn about gradient-free methods, namely genetic algorithms; develop these algorithms step by step; and use them to optimize neural networks and RL-based algorithms. This chapter discusses the limitations of gradient-based methods, such as getting stuck at local optima and slower convergence when dealing with noisy input. This chapter presents an alternative optimization solution to gradient methods through genetic algorithms, as they ensure global optimum convergence. You will examine and implement the structure of genetic algorithms and implement them through hyperparameter selection for neural networks and evolving network topologies, as well as using them in combination with RL for a cart-pole balancing activity. Hybrid neural networks that use genetic algorithms are used to solve complex problems, such as modeling plasma chemical reactors, designing fractal frequency selective surfaces, or optimizing production processes. In the following section, you will be examining the problems posed by gradient-based methods.

Problems with Gradient-Based Methods

In this section, you will learn about the differences between value-based and policy-based methods and the use of gradient-based methods in policy search algorithms. You will then examine the advantages and disadvantages of using gradient-based methods in policy-based approaches and implement stochastic gradient descent using TensorFlow to solve a cubic function with two unknowns.

There are two approaches when doing RL: value-based and policy-based. These approaches are used to solve complex decision problems related to Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs). Value-based approaches rely on identifying and deriving the optimal policy based on the identification of the optimal value function. Algorithms such as Q-learning or SARSA(λ) are included within this category, and for tasks involving lookup tables, their implementation leads to convergence on a return that is optimal, globally. As the algorithms rely on a known model of the environment, for partially observable or continuous spaces, there are no guarantees for convergence on a solution that is optimal using these value search methods.

Conversely, policy-based approaches, instead of relying on the value function to maximize the return, use gradient methods (stochastic optimization) to explore the policy space. Gradient-based methods or policy gradient methods map the parametrized space (environment) to the policy space using loss functions, thus enabling the RL agent to explore directly the entirety, or a portion, of the policy space. One of the most widely used methods (which is going to be implemented in this section) is gradient descent.

Note

For further reading on gradient descent, please refer to the technical paper by Marbach, 2001, at the following link: https://link.springer.com/article/10.1023/A:1022145020786.

The advantages of the gradient approaches (stochastic gradient descent or ascent) are that they are suitable for POMDPs or non-MDPs, especially for solving robotics problems with multiple constraints. However, there are several disadvantages to employing gradient-based methods. The most notable one is that algorithms such as REINFORCE and DPG determine a local optimum of the expected reward. As the local optimum is found, the RL agent does not expand its search globally. For example, a robot solving a maze problem will get stuck in a corner and will continuously try to move in the same location. Additionally, when dealing with high return variance or noisy input data, algorithm performance is affected as they converge slower. This happens when, for instance, a robotic arm is programmed to pick up and place a blue component in a tray, but the table has blue hues to its color, which interferes with the detection of the component through the sensors (such as a camera).

Note

For further reading on the REINFORCE algorithm, please refer to the technical paper by Williams, 1992, at the following link: https://link.springer.com/article/10.1007/BF00992696.

Similarly, please read about the DPG algorithm by Silvester, 2014, at the following link: http://proceedings.mlr.press/v32/silver14.pdf.

An alternative to gradient-based methods is the use of gradient-free methods, which rely on evolutionary algorithms to achieve a global optimum for the return.

The following exercise will enable you to understand the potential of gradient methods for converging on optimal solutions and the lengthy process that is undertaken as the method searches step by step for the optimal solution. You will be presented with a mathematical function (loss function) that maps the input values, , to an output value, . The goal is to identify the optimal values of the inputs that lead to the lowest value of the output; however, this is step-dependent and is at risk of staying at a local optimum. We will be using the GradientTape() function to calculate the gradients, which are nothing but differentiation solutions. This will help you understand the limitations of such optimization strategies.

Exercise 12.01: Optimization Using Stochastic Gradient Descent

This exercise aims to enable you to apply gradient methods, most notably Stochastic Gradient Descent (SGD), available in TensorFlow by following the steps required to converge on an optimal solution.

The following loss function has two unknowns, :

Figure 12.1: Sample loss function

Find the optimum values for within 100 steps with a learning rate of 0.1.

Perform the following steps to complete the exercise:

Create a new Jupyter Notebook.
Import the tensorflow package as tf:
import tensorflow as tf
Define a function that outputs :
def funct(x,y):
return x**2-8*x+y**2+3*y
Define a function for initializing the x and y variables and initialize them with the values 5 and 10:
def initialize():
    x = tf.Variable(5.0)
    y = tf.Variable(10.0)
    return x, y
x, y= initialize()
In the preceding code snippet, we have used decimal format for the values assigned to x and y to start the optimization process, as the Variable() constructor needs to have a tensor type of float32.
Instantiate the optimizer by selecting SGD from keras in TensorFlow and input the learning rate of 0.1:
optimizer = tf.keras.optimizers.SGD(learning_rate = 0.1)
Set a loop of 100 steps, where you calculate the loss, use the GradientTape() function for automatic differentiations, and process the gradients:
for i in range(100):
    with tf.GradientTape() as tape:
        # Calculate loss function using x and y values
        loss= funct(x,y)
        # Get gradient values
        gradients = tape.gradient(loss, [x, y])
        # Save gradients in array without altering them
        p_gradients = [grad for grad in gradients]
In the preceding code, we have used GradientTape() from TensorFlow to calculate the gradients (which essentially are differentiation solutions). We created a loss parameter that stores the value when calling the function. GradientTape() is activated when calling the gradient() method, which essentially is used to compute multiple gradients in a single computation. The gradients are stored in a p_gradients array.
Use the zip() function to aggregate the gradients to the values:
ag = zip(p_gradients, [x,y])
Print the step and the values of :
print('Step={:.1f} , z ={:.1f},x={:.1f},y={:.1f}'
.format(i, loss.numpy(), x.numpy(), y.numpy()))
Apply the optimizer using the gradients that were processed:
optimizer.apply_gradients(ag)
Run the application.
You will get the following output:

Figure 12.2: Step-by-step optimization using SGD

You can observe in the output that from Step=25 onward, the values do not change; therefore, they are considered to be the optimum values for the respective loss function.

By printing the steps and values of the inputs and outputs, you can observe that the algorithm converges before the termination of the 100 steps to the optimal values of . However, you can observe that the problem is step-dependent: if the optimization is stopped before global optimum convergence, the solution would be sub-optimal.

Note

To access the source code for this specific section, please refer to https://packt.live/2C10rXD.

You can also run this example online at https://packt.live/2DIWSqc.

This exercise helped your understanding and application of SGD when solving a loss function, developing your analysis skills as well as your skills in programming using TensorFlow. This will help you in your choice of optimization algorithm, giving you an understanding of the limitations of gradient-based methods.

In this section, we have explored the benefits and disadvantages of gradient methods with respect to RL algorithms, identifying the types of problems that they are suitable for within the context of decision-making processes. The example offered a simple application of gradient descent, where the optimal solution for two unknowns was identified using SGD optimization in TensorFlow. In the next section, we will be exploring an optimization alternative that is gradient-free: genetic algorithms.

Introduction to Genetic Algorithms

As the problem with gradient methods is that the solution can get stuck at a single local optimum, other methods, such as gradient-free algorithms, can be considered as alternatives. In this section, you will learn about gradient-free methods, specifically evolutionary algorithms (for example, genetic algorithms). This section provides an overview of the steps taken for the implementation of genetic algorithms and exercises on how to implement an evolutionary algorithm to solve the loss function given in the previous section.

When multiple local optima exist or function optimization is required, gradient-free methods are recommended. These methods include evolutionary algorithms and particle swarm optimizations. A characteristic of these methods is that they rely on sets of optimization solutions that are commonly referred to as populations. The methods rely on iteratively searching for a good solution or a distribution that can solve a problem or a mathematical function. The search pattern for the optimal solution is modeled based on Darwin's natural selection paradigm and the biological phenomenon of genetic evolution. Evolutionary algorithms draw inspiration from biological evolution patterns such as mutation, reproduction, recombination, and selection. Particle swarm algorithms are inspired by group social behavior, such as a beehive organization or ant farms, where single solutions are termed as particles that can evolve over time.

Natural selection stems from the premise that genetic material (the chromosome) encodes the survival of a species, in a certain way. The evolution of the species relies on how well it adapts to its external environment and the information passed from parents to children. In genetic material, there are variations (mutations) between generations that can lead to successful or unsuccessful adaptation to the environment (especially in dire conditions). Therefore, there are three steps to genetic algorithms: selection, reproduction (crossover), and mutation.

Evolutionary algorithms go about things by creating an original population of solutions, selecting a sub-set, and using recombination or mutation to obtain different solutions. This new set of solutions can replace, partly or fully, the original set. For the replacement to take place, the solutions go through a selection process that relies on analyzing their fitness. This increases the chances of solutions that are more suited to being utilized to develop a new set of solutions.

Other than the development of solutions, evolutionary algorithms can be used for parameter adaptation, using probability distributions. A population is still generated; however, a fitness method is used to select the parameters of the distribution instead of the actual solutions. After the new parameters are identified, the new distribution is used to generate a new set of solutions. Some strategies of parameter selection include the following:

Using natural gradient ascent after the gradients of the parameters are estimated from the original population, also known as Natural Evolutionary Strategies (NESes)
Selecting solutions with a specific parameter and using the mean of this sub-set to find a new distribution mean, known as Cross-Entropy Optimization (CEO)
Attributing a weight to each solution based on its fitness, using the weighted average as a new distribution mean – Covariance Matrix Adaptation Evolution Strategies (CMAESes)

One of the major problems identified with evolutionary strategies is that achieving solution fitness can be computationally expensive and noisy.

Genetic Algorithms (GAs) keep the solution population and conduct searches in multiple directions (through the chromosomes), furthering the exchange of information in these directions. The algorithms are most notably implemented on strings, which are either binary or character-based. The two main operations performed are mutation and crossover. The selection of the progenies is based on how close the solution is to the target (objective function), which denotes their fitness.

As an overview, GAs have the following steps:

Population creation.
Fitness score creation and assignment to each solution of the population.
The selection of two parents to reproduce based on the fitness scores (potentially the solutions with the best performance).
The creation of the two child solutions by combining and re-organizing the code of the two parents.
The application of a random mutation.
Child generation is repeated until the new population size is achieved and weights (fitness scores) for the population are assigned.
The process is repeated until the maximum number of generations is reached or the target performance is achieved.

We will be looking at each of these steps in detail further in this chapter.

Among the many differences between gradient-based algorithms and GAs, one difference is the process of development. Gradient-based algorithms rely on differentiation, whereas GAs use the genetic processes of selection, reproduction, and mutation. The following exercise will enable you to implement GAs and evaluate their performance. You will be using a simple genetic algorithm in TensorFlow to identify the GA hyperparameter optimization for finding the optimal solution for Recurrent Neural Network (RNN) training for the same loss function as for the SGD method. To identify the optimal values, you will need to implement an evolutionary algorithm called Differential Evolution (DE), available in the tensorflow_probability package.

Exercise 12.02: Implementing Fixed-Value and Uniform Distribution Optimization Using GAs

In this exercise, you will still need to solve the following function, as in the previous exercise:

Figure 12.3: Sample loss function

Find the optimum values for for a population size of 100, starting from initialized to 5 and 10, and then extending to random samples from a distribution similar to the gradient-based method.

The goal of this exercise is to enable you to analyze the differences in applying GAs and gradient-descent methods, by starting from a single pair of variables and a variety of potential solutions. The algorithm aids in optimization problems by applying selection, crossover, and mutation to reach an optimal or nearly optimal solution. Additionally, you will sample the values from a uniform distribution for a population of 100. By the end of this exercise, you will have evaluated the differences between starting from a fixed variable and sampling from a distribution:

Create a new Jupyter Notebook.
Import the tensorflow package and download and import tensorflow_probability:
import tensorflow as tf
import tensorflow_probability as tfp
Define a function that outputs :
def funct(x,y):
return x**2-8*x+y**2+3*y
Identify the initial step by defining the variables with values of 5 and 10:
initial_position = (tf.Variable(5.0), tf.Variable(10.0))
Instantiate the optimizer by selecting the tensorflow_probability optimizer named differential_evolution_minimize:
optimizer1 = tfp.optimizer.differential_evolution_minimize
             (funct, initial_position = initial_position,
              population_size = 100,
              population_stddev = 1.5, seed = 879879)
Print the final values of , by using the objective_value and position functions:
print('Final solution: z={:.1f}, x={:.1f}, y={:.1f}'
      .format(optimizer1.objective_value.numpy(),
      optimizer1.position[0].numpy(),
      optimizer1.position[1].numpy()))
Run the application. You will get the following output. You can observe that the final values are identical to the Step=25.0 value in Figure 12.2:
Final solution: z=-18.2, x=4.0, y=-1.5
In this exercise, the final optimal solution will be displayed. There are no additional optimization steps needed to reach the same solution as the gradient-based method. You can see that you are using fewer lines of code and that the time taken for the algorithm to converge is shorter.
For uniform optimization, the steps to modify the code are as follows:
Import the random package:
import random
Initialize the population size and create the initial population sampling the variables from a random uniform distribution of the population size:
size = 100
initial_population = (tf.random.uniform([size]),
tf.random.uniform([size]))
Use the same optimizer, change the initial_position parameter to initial_population; use the same seed:
optimizer2 = tfp.optimizer.differential_evolution_minimize
(funct, initial_population= initial_population,
seed=879879)
Print the final values of , by using the objective_value and position functions:
print('Final solution: z={:.1f}, x={:.1f}, y={:.1f}'
      .format(optimizer2.objective_value.numpy(),
      optimizer2.position[0].numpy(),
      optimizer2.position[1].numpy()))
The output will be as follows:
Final solution: z=-18.2, x=4.0, y=-1.5

You will get the same result despite the variation in values. This means that we can randomly sample or choose a specific set of initial values, and the GA will still converge to the optimal solution faster, meaning we can improve our code by using fewer lines of code than if we'd used a gradient-based method.

Note

To access the source code for this specific section, please refer to https://packt.live/2MQmlPr.

You can also run this example online at https://packt.live/2zpH6hJ.

The solution will converge to the optimal values irrespective of the initial starting point, whether using a fixed value for the inputs or a random sampling of the population of chromosomes.

This section offered a general overview of evolutionary algorithms, explaining the differences between evolutionary strategies and GAs. You've had the opportunity to implement differential evolution using the tensorflow_probabilities package to optimize the solution of a loss function, analyzing the implementation of two different techniques: starting from fixed input values and using random sampling for the input values. You also had the opportunity to evaluate the implementation of GAs compared to gradient descent methods. GAs can use independent starting values and their convergence to a global optimum is faster and less prone to disturbances that gradient descent methods, whereas gradient descent is step-dependent and has higher sensitivity to the input variable.

In the following section, we will build on the principles of developing GAs, starting with a look at population creation.

Components: Population Creation

In the previous section, you were introduced to evolutionary methods for function optimization. In this section, we will concentrate on population creation, fitness score creation, and the task of creating the genetic algorithm.

The population, , is identified as a group of individuals or chromosomes:

Figure 12.4: Expression for the population

Here, s represents the total number of chromosomes (population size) and i is the iteration. Each chromosome is a possible solution to the presented problem in an abstract form. For a binary problem, the population can be a matrix with randomly generated ones and zeros.

The chromosome is a combination of input variables (genes):

Figure 12.5: Expression for the chromosome

Here, m is the maximum number of genes (or variables).

When translated to code, population creation can be demonstrated as follows:

population = np.zeros((no_chromosomes, no_genes))

for i in range(no_chromosomes):

ones = random.randint(0, no_genes)

population[i, 0:ones] = 1

np.random.shuffle(population[i])

Each chromosome is then compared using a fitness function:

Figure 12.6: Fitness function

The fitness function can be translated to code as follows:

identical_to_target = population == target

The output of the function is a score indicating how close the chromosome is to the target (optimal solution). The target is represented by the maximization of the fitness function. There cases where the optimization problem relies on minimizing a cost function. The function can be a mathematical one, a thermodynamic model, or a computer game. This can be done either by considering the chromosomes with low weightings (scores) or by adapting the cost function into a fitness one.

Once the fitness function is identified and defined, the evolution process can start. The initial population is generated. A characteristic of the initial population is diversity. To offer this diversity, the elements can be randomly generated. To make the population evolve, the iterative process starts by selecting the parents that offer the best fit to start the reproduction process.

Exercise 12.03: Population Creation

In this exercise, we will be creating an original population of binary chromosomes of length 5. Each chromosome should have eight genes. We will define a target solution and output the similarity of each chromosome to it. This exercise aims to allow you to design and establish the first set of steps for a GA and find the binary solution that fits the target. The exercise is similar to matching the output of a control system with a target:

Create a new Jupyter Notebook. Import the random and numpy libraries:
import random
import numpy as np
Create a function for the random population:
# create function for random population
def original_population(chromosomes, genes):
    #initialize the population with zeroes
    population = np.zeros((chromosomes, genes))
    #loop through each chromosome
    for i in range(chromosomes):
        #get random no. of ones to be created
        ones = random.randint(0, genes)
        #change zeroes to ones
        population[i, 0:ones] = 1
        #shuffle rows
        np.random.shuffle(population[i])
    return population
Define a function for creating the target solution:
def create_target_solution(gene):
    #assume that there is an equal number of ones and zeroes
    counting_ones = int(gene/2)
    # build array with equal no. of ones and zeros
    target = np.zeros(gene)
    target[0:counting_ones] = 1
    # shuffle the array to mix zeroes and ones
    np.random.shuffle(target)
    return target
Define a function for calculating the fitness weighting for each chromosome:
def fitness_function(target,population):
    #create an array of true/false compared to the reference
    identical_to_target = population == target
    #sum no. of genes that are identical
    fitness_weights = identical_to_target.sum(axis = 1)
    return fitness_weights
In the preceding code, you are comparing each chromosome of the population with the target and cataloging the similarity as a Boolean – True if similar or False if different – in the matrix called identical_to_target. Count all the elements that are true and output them as the weights.
Initialize the population with 5 chromosomes and 8 genes and calculate weights:
#population of 5 chromosomes, each having 8 genes
population = original_population(5,8)
target = create_target_solution(8)
weights = fitness_function(target,population)
In the preceding code, we calculate population, target, and weights based on the three developed functions.
Print the target solution, the index of the chromosome, the chromosome, and the weight using a for loop:
print(' target:', target)
for i in range(len(population)):
print('Index:', i, ' chromosome:', population[i],
' similarity to target:', weights[i])
Run the application. You will get a similar output to this, as the population elements are randomized:
target: [0. 0. 1. 1. 1. 0. 0. 1.]
Index: 0
chromosome: [1. 1. 1. 1. 1. 0. 1. 1.]
similarity to target: 5
Index: 1
chromosome: [1. 0. 1. 1. 1. 0. 0. 0.]
similarity to target: 6
Index: 2
chromosome: [1. 0. 0. 0. 0. 0. 0. 0.]
similarity to target: 3
Index: 3
chromosome: [0. 0. 0. 1. 1. 0. 1. 0.]
similarity to target: 5
Index: 4
chromosome: [1. 0. 0. 1. 1. 1. 0. 1.]
similarity to target: 5

You will notice that each chromosome is compared to the target and the similarity (based on the fitness function) is printed out.

Note

To access the source code for this specific section, please refer to https://packt.live/2zrjadT.

You can also run this example online at https://packt.live/2BSSeEG.

This section showcased the first steps of genetic algorithm development: the generation of a random population, fitness score assignment for each element of the population (chromosome), and getting the number of elements that are the best fit compared to the target (in this case have the highest similarity with an optimal solution). The following sections will expand on the code generation that occurs until the optimal solution is reached. To do this, in the next section, you will explore the selection of the parents for the reproduction process.

Components: Parent Selection

The previous section showcased the concepts of populations; we looked at creating a target solution and comparing that solution with the elements (chromosomes) of the population. These concepts were implemented in an exercise that will be continued in this section. In this section, you will explore the concept of selection and implement two selection strategies.

For the reproduction process (which is the quintessential part of GAs, as they rely on creating future generations of stronger chromosomes), there are three steps:

Parent selection
Mixing the parents to create new children (crossover)
Replacing them with the children in the population

Selection essentially consists of choosing two or more parents for the mixing process. Once a fitness criterion is selected, the way in which the selection of the parents will be performed needs to be chosen, as does how many children will come from the parents. Selection is a vital step in performing genetic evolution, as it involves determining the children with the highest fitness. The most common way to select the best individuals is by the "survival of the fittest." This means the algorithm will improve the population in a step-by-step manner. The convergence of the GA is dependent upon the degree to which chromosomes with higher fitness are chosen. Therefore, the convergence speed is highly dependent on the successful selection of chromosomes. If the chromosomes with the highest fitness are prioritized, there is a chance that a sub-optimal solution will be found; if the candidates have consistently low fitness, then convergence will be extremely slow.

The available selection methods are as follows:

Top-to-bottom pairing: This refers to creating a list of chromosomes and pairing them two by two. The chromosomes with odd indexes are paired with the even chromosomes, thus generating mother-father couples. The chromosomes at the top of the list are selected.
Random Selection: This involves using a uniform number generator to select the parents.
Random Weighted Selection or a Roulette Wheel: This involves calculating the probability of the suitability of a chromosome compared to the entire population. The selection of the parent is done randomly. The probability (weight) can be determined either by rank or fitness. The first approach (see Figure 12.7) relies on the rank of the chromosome , which can constitute the index of the chromosome in the population list, and represents the number of required chromosomes (parents):

Figure 12.7: Probability using rank

The second approach (see Figure 12.8) relies on the fitness of the chromosome () compared to the sum of the fitness of the entire population ( ):

Figure 12.8: Probability using chromosome fitness

As an alternative, the probability (see Figure 12.9) can also be calculated based on the fitness of the chromosome () compared with the highest fitness of the population . In all of the cases, the probabilities are compared to the randomly selected numbers to identify the parents with the best weights:

Figure 12.9: Probability using the highest fitness of the population

Selection by Tournament: This method is based on the random selection of a subset of chromosomes, out of which the chromosome with the highest fitness is selected as a parent. This repeats until the required number of parents is identified.

The roulette wheel and tournament techniques are among the most popular selection methods implemented in GAs, as they are inspired by biological processes. The problem with the roulette technique is that it can be noisy, and depending on which type of selection is used, the convergence rate can be affected. A benefit of the tournament method is that it can deal with large populations, leading to smoother convergence. The roulette wheel method is used to include random elements in the population, whereas when you are aiming to identify the parents with the highest similarity to the target, you use the tournament method. The following exercise will enable you to implement the tournament and roulette wheel techniques and evaluate your understanding of them.

Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques

In the exercise, you will implement the tournament and roulette wheel methods for the population of binary chromosomes of Exercise 12.02, Implementing Fixed Value and Uniform Distribution Optimization Using GAs. Each chromosome should have eight genes. We will define a target solution and print two sets of parents: one based on the tournament method and the other by roulette from the remaining population. Once each parent is chosen, set the fitness rank to the minimum:

Create a new Jupyter Notebook. Import the random and numpy libraries:
import random
import numpy as np
Create a function for the random population:
# create function for random population
def original_population(chromosomes, genes):
    #initialize the population with zeroes
    population = np.zeros((chromosomes, genes))
    #loop through each chromosome
    for i in range(chromosomes):
        #get random no. of ones to be created
        ones = random.randint(0, genes)
        #change zeroes to ones
        population[i, 0:ones] = 1
        #shuffle rows
        np.random.shuffle(population[i])
    return population
Define a function for creating the target solution:
def create_target_solution(gene):
    #assume that there is an equal number of ones and zeroes
    counting_ones = int(gene/2)
    # build array with equal no. of ones and zeros
    target = np.zeros(gene)
    target[0:counting_ones] = 1
    # shuffle the array to mix zeroes and ones
    np.random.shuffle(target)
    return target
Define a function for calculating the fitness weighting for each chromosome:
def fitness_function(target,population):
    #create an array of true/false compared to the reference
    identical_to_target = population == target
    #sum no. of genes that are identical
    fitness_weights = identical_to_target.sum(axis = 1)
    return fitness_weights
Define a function for selecting the pair of parents with the highest weighting (the highest fitness score). Since the population is reduced, the chromosomes are competing more. This method is also known as tournament selection:
# select the best parents
def select_parents(population, weights):
    #identify the parent with the highest weight
    parent1 = population[np.argmax(weights)]
    #replace weight with the minimum number
    weights[np.argmax(weights)] = 0
    #identify the parent with the second-highest weight
    parent2 = population[np.argmax(weights)]
    return parent1, parent2
Create a function for the roulette wheel by selecting a random number from a uniform distribution:
def choice_by_roulette(sorted_population, fitness):
    normalised_fitness_sum = 0
    #get a random draw probability
    draw = random.uniform(0,1)
    prob = []
In the function, calculate the sum of all the fitness scores:
for i in range(len(fitness)):
normalised_fitness_sum += fitness[i]
Calculate the probability of the fitness of the chromosome compared to the sum of all fitness scores and compared to the chromosome with the highest fitness score:
    ma = 0
    n = 0
# calculate the probability of the fitness selection
    for i in range(len(sorted_population)):
           probability = fitness[i]/normalised_fitness_sum
           #compare fitness to the maximum fitness and track it
           prob_max = fitness[i]/np.argmax(fitness)
           prob.append(probability)
            if ma < prob_max:
                ma = prob_max
                n = i
Run through all the chromosomes and select the parent that has a higher fitness probability compared to the sum of fitness scores higher than the draw, or the parent with the highest probability compared to the maximum fitness score:
      for i in range(len(sorted_population)):
            if draw <= prob[i]:
                fitness[i] = 0
                return sorted_population[i], fitness
            else:
                fitness[n] = 0
                return sorted_population[n], fitness
Initialize population, calculate target and the fitness scores, and print the scores and target:
population = original_population(5,8)
target = create_target_solution(8)
weights = fitness_function(target,population)
print(weights)
print(' target:', target)
You will get a similar output to this:
[5 1 5 3 4]
Apply the first selection method and print out the parents and the new scores:
print(' target:', target)
parents = select_parents(population,weights)
print('Parent 1:', parents[0],' Parent 2:', parents[1])
print(weights)
You will get a similar output to this for the tournament selection process:
target: [0. 1. 1. 1. 1. 0. 0. 0.]
Parent 1: [1. 1. 1. 1. 1. 0. 1. 1.]
Parent 2: [1. 1. 1. 1. 1. 1. 1. 0.]
[0 1 5 3 4]
You can observe that for parent 1, the score has been replaced with 0. For parent 2, the score stays the same.
Use the roulette function to select the next two parents and print out the parents and the weights:
parent3, weights = choice_by_roulette(population, weights)
print('Parent 3:', parent3, 'Weights:', weights)
parent4, weights = choice_by_roulette(population, weights)
print('Parent 4:', parent4,'Weights:', weights)
You will have a similar output to this:
0.8568696148662779
[0.0, 0.07692307692307693, 0.38461538461538464,
0.23076923076923078, 0.3076923076923077]
Parent 3: [1. 1. 1. 1. 1. 1. 1. 0.] Weights: [0 1 0 3 4]
0.4710306341255527
[0.0, 0.125, 0.0, 0.375, 0.5]
Parent 4: [0. 0. 1. 0. 1. 1. 1. 0.] Weights: [0 1 0 3 0]

You can see that parents 2 and 3 are the same. This time, the weight for the respective parent is changed to 0. Additionally, parent 4 is selected and has its weighting changed to 0.

Note

To access the source code for this specific section, please refer to https://packt.live/2MTsKJO.

You can also run this example online at https://packt.live/2YrwMhP.

With this exercise, you have implemented a tournament-like method, by selecting the parents with the highest scores, and the roulette wheel selection technique. Also, you have developed a method of avoiding the double-selection of the same chromosome. The first set of parents was chosen using the first method, whereas the second method was used in selecting the second set of parents. We have also identified a need for a method of replacing indexes to avoid double-selection of the same chromosome, which is one of the pitfalls of the selection process. This helped you to understand the differences between the two methods and allowed you to put into practice GA-related methods from population generation to selection.

Components: Crossover Application

This section expands on recombining the genetic code of the parents by means of crossover into children (that is, the creation of the two child solutions by combining and re-organizing the code of the two parents). Various techniques can be used to create new solutions for generating a new population. The binary information of two viable solutions in machine learning can be recombined by a process called crossover, which is similar to biological genetic exchange, where genetic information is transmitted from parents to children. Crossover ensures that the genetic material of a solution is transmitted to the next generation.

Crossover is the most common form of reproduction technique, or mating. Between the first and last bits of the parents (selected chromosomes), the crossover point represents the splitting point of the binary code that will be passed onto the children (offspring): the part to the left of the crossover point of the first parent will be inherited by the first child, and everything to the right side of the crossover point of the second parent will become the part of the first child. The left side of the second parent combined with the right side of the first parent results in the second child:

child1 = np.hstack((parent1[0:p],parent2[p:]))

child2 = np.hstack((parent2[0:p], parent1[p:]))

There are multiple crossover techniques, as listed follows:

Single-point crossover (which you can see in the preceding code) involves splitting the genetic code of the parents at one point and passing the first part to the first child, and the second part to the second child. It is used by traditional GAs; the crossover point is identical for both chromosomes and is selected randomly.
Two-point crossover involves two crossover points impacting the gene exchange between the two parents. The more crossover points are introduced, the more the performance of the GA can be reduced as the genetic makeup is lost. However, introducing two-point crossover can lead to a better exploration of the state or parameter space.
Multi-point crossover involves a number of splits. If the number of splits is even, then the splits are selected randomly and the sections in the chromosome are exchanged. If the number is odd, then the splits are alternating the exchanges of section.
Uniform crossover involves the random selection (as in a coin toss) of the parent that will provide an element of the chromosome (gene).
Three-parent crossover entails the comparison of each gene between two parents. If they have the same value, the child inherits the gene; if not, the child inherits the gene from the third parent.

Consider the following code example:

def crossover_reproduction(parents, population):

#define parents separately

parent1 = parents[0]

parent2 = parents[1]

#randomly assign a point for cross-over

p = random.randrange(0, len(population))

print("Crossover point:", p)

#create children by joining the parents at the cross-over point

child1 = np.hstack((parent1[0:p],parent2[p:]))

child2 = np.hstack((parent2[0:p], parent1[p:]))

return child1, child2

In the preceding code, we define the crossover function between two parents. We have defined the parents separately and then randomly assigned a certain point for crossover. Then, we have defined the children to be created by joining the parents at the defined crossover point.

In the following exercise, you will continue the process of implementing the components of GAs to create child chromosomes.

Exercise 12.05: Crossover for a New Generation

In this exercise, we will be implementing crossover between two parents, for a new generation. Following the steps from Exercise 12.04, Implementing Tournament and Roulette Wheel, and using the chromosomes with the highest weight, we will apply single-point crossover to create the first new set of children:

Create a new Jupyter Notebook. Import the random and numpy libraries:
import random
import numpy as np
Create the function for a random population:
def original_population(chromosomes, genes):
    #initialize the population with zeroes
    population = np.zeros((chromosomes, genes))
    #loop through each chromosome
    for i in range(chromosomes):
        #get random no. of ones to be created
        ones = random.randint(0, genes)
        #change zeroes to ones
        population[i, 0:ones] = 1
        #shuffle rows
        np.random.shuffle(population[i])
    return population
As you can see in the previous code, we have created a population function.
Define a function to create the target solution:
def create_target_solution(gene):
    #assume that there is an equal number of ones and zeroes
    counting_ones = int(gene/2)
    # build array with equal no. of ones and zeros
    target = np.zeros(gene)
    target[0:counting_ones] = 1
    # shuffle the array to mix zeroes and ones
    np.random.shuffle(target)
    return target
Define a function for calculating the fitness weighting for each chromosome:
def fitness_function(target,population):
    #create an array of true/false compared to the reference
    identical_to_target = population == target
    #sum no. of genes that are identical
    fitness_weights = identical_to_target.sum(axis = 1)
    return fitness_weights
Define a function for selecting the pair of parents with the highest weighting (highest fitness score). Since the population is smaller, the chromosomes are competing more. This method is also known as tournament selection:
# select the best parents
def select_parents(population, weights):
    #identify the parent with the highest weight
    parent1 = population[np.argmax(weights)]
    #replace weight with the minimum number
    weights[np.argmax(weights)] = 0
    #identify the parent with the second-highest weight
    parent2 = population[np.argmax(weights)]
    return parent1, parent2
Define a function for crossover by using a randomly selected crossover point:
def crossover_reproduction(parents, population):
    #define parents separately
    parent1 = parents[0]
    parent2 = parents[1]
    #randomly assign a point for cross-over
    p = random.randrange(0, len(population))
    print("Crossover point:", p)
    #create children by joining the parents at the cross-over point
    child1 = np.hstack((parent1[0:p],parent2[p:]))
    child2 = np.hstack((parent2[0:p], parent1[p:]))
    return child1, child2
Initialize the population with 5 chromosomes and 8 genes and calculate weights:
population = original_population(5,8)
target = create_target_solution(8)
weights = fitness_function(target,population)
Print the target solution:
print(' target:', target)
The output will be as follows:
target: [1. 0. 0. 1. 1. 0. 1. 0.]
Select the parents with the highest weight and print the final selection:
parents = select_parents(population,weights)
print('Parent 1:', parents[0],' Parent 2:', parents[1])
The output will be as follows:
Parent 1: [1. 0. 1. 1. 1. 0. 1. 1.]
Parent 2: [1. 0. 0. 0. 0. 0. 0. 0.]
Apply the crossover function and print the children:
children = crossover_reproduction(parents,population)
print('Child 1:', children[0],' Child 2:', children[1])
The output will be as follows:
Crossover point: 4
Child 1: [1. 0. 1. 1. 0. 0. 0. 0.]
Child 2: [1. 0. 0. 0. 1. 0. 1. 1.]
Run the application.
You will get a similar output to that shown in the following snippet. As you can see, the population elements are randomized. Check that the elements of Child 1 and Child 2 are the same as those of Parent 1 and Parent 2:
target: [1. 0. 1. 1. 0. 0. 1. 0.]
. . .
Parent 1: [1. 0. 1. 1. 1. 0. 1. 1.]
Parent 2: [0. 0. 1. 1. 0. 1. 0. 0.]
. . .
Crossover point: 1
Child 1: [1. 0. 1. 1. 0. 1. 0. 0.]
Child 2: [0. 0. 1. 1. 1. 0. 1. 1.]. . .

You can check that the starting elements from the crossover point in the array of Child 1 have the same array elements as Parent 2, and that Child 2 has the same array elements as Parent 1.

Note

To access the source code for this specific section, please refer to https://packt.live/30zHbup.

You can also run this example online at https://packt.live/3fueZxx.

In this section, we identified the various strategies for the recombination technique known as crossover. A basic implementation of single-point crossover, where the crossover point is randomly generated, was represented. In the following section, we will examine the last element of GA design: population mutation.

Components: Population Mutation

In the previous sections, you have implemented population generation, parent selection, and crossover reproduction. This section will concentrate on the application of random mutation and the repetition of child generations until a new population size is achieved and weights (fitness scores) for the population of the genetic algorithm are assigned. This section will include an explanation of the mutation technique. This will be followed by a presentation of the available mutation techniques as well as a discussion about population replacement. Finally, an exercise implementing mutation techniques will be presented.

A caveat of gradient methods is that the algorithms can stop at a local optimum solution. To prevent this from happening, mutations can be introduced to the population of solutions. Mutation generally occurs after the crossover process. Mutation relies on randomly assigning binary information in either a set of chromosomes or in the entire population. Mutation provides an avenue of problem space exploration by introducing a random change in the population. This technique prevents rapid convergence and encourages the exploration of new solutions. In the final steps (the last generations) or when the optimal solution is reached, mutation ceases to be applied.

There are various mutation techniques, as follows:

Single-point mutation (flipping) involves randomly selecting genes from different chromosomes and changing their binary values to their opposites (from 0 to 1 and 1 to 0).
Interchanging involves selecting two sections of the chromosome of one parent and swapping them, thus generating a new child.
You can also reverse a randomly selected segment within the parent or the population of chromosomes, and all the binary values are changed to their opposites.

The occurrence of a mutation is determined by its probability. The probability defines the frequency at which mutations occur within the population. If the probability is 0%, then after the crossover, the children are unaltered; if a mutation occurs, one or more parts of the chromosome or the population are changed. If the probability is 100%, then the entire chromosome is changed.

After the mutation process occurs, the fitness of the new children is calculated, and the population is altered to include them. This leads to a new generation of the population. Depending on the strategy used, the parents with the lowest fitness scores are discarded to leave room for the newly generated children.

Exercise 12.06: New Generation Development Using Mutation

In this exercise, we will be focusing on the development of a new generation. We will again create a new population, select two parent chromosomes, and use crossover to develop two children. We will then add the two new chromosomes to the population and mutate the entire population with a probability of 0.05:

Create a new Jupyter Notebook. Import the random and numpy libraries:
import random
import numpy as np
Create a function for the random population:
def original_population(chromosomes, genes):
    #initialize the population with zeroes
    population = np.zeros((chromosomes, genes))
    #loop through each chromosome
    for i in range(chromosomes):
        #get random no. of ones to be created
        ones = random.randint(0, genes)
        #change zeroes to ones
        population[i, 0:ones] = 1
        #shuffle rows
        np.random.shuffle(population[i])
    return population
Define a function to create the target solution:
def create_target_solution(gene):
    #assume that there is an equal number of ones and zeroes
    counting_ones = int(gene/2)
    # build array with equal no. of ones and zeros
    target = np.zeros(gene)
    target[0:counting_ones] = 1
    # shuffle the array to mix zeroes and ones
    np.random.shuffle(target)
    return target
Define a function to calculate the fitness weighting for each chromosome:
def fitness_function(target,population):
    #create an array of true/false compared to the reference
    identical_to_target = population == target
    #sum no. of genes that are identical
    fitness_weights = identical_to_target.sum(axis = 1)
    return fitness_weights
Define a function to select the pair of parents with the highest weighting (highest fitness score). Since the population is small, the chromosomes are competing more. This method is also known as tournament selection:
# select the best parents
def select_parents(population, weights):
    #identify the parent with the highest weight
    parent1 = population[np.argmax(weights)]
    #replace weight with the minimum number
    weights[np.argmax(weights)] = 0
    #identify the parent with the second-highest weight
    parent2 = population[np.argmax(weights)]
    return parent1, parent2
Define a function for crossover by using a randomly selected crossover point:
def crossover_reproduction(parents, population):
    #define parents separately
    parent1 = parents[0]
    parent2 = parents[1]
    #randomly assign a point for cross-over
    p = random.randrange(0, len(population))
    print("Crossover point:", p)
    #create children by joining the parents at the cross-over point
    child1 = np.hstack((parent1[0:p],parent2[p:]))
    child2 = np.hstack((parent2[0:p], parent1[p:]))
    return child1, child2
Define a function for a mutation that uses the probability and the population as inputs:
def mutate_population(population, mutation_probability):
    #create array of random mutations that uses the population
    mutation_array = np.random.random(size = (population.shape))
    """
    compare elements of the array with the probability and
    put the results into an array
    """
    mutation_boolean = mutation_array
                       >= mutation_probability
    """
    convert boolean into binary and store to create a new
    array for the population
    """
    population[mutation_boolean] = np.logical_not
                                   (population[mutation_boolean])
    return population
In the preceding code snippet, the condition set for the mutation selection is to check that each element of the array is higher than the mutation probability which acts as a threshold. If the element is higher than the threshold, mutation is applied.
Append the array of children to the original population, creating a new crossover population, and use the print() function to display it:
population = original_population(5,8)
target = create_target_solution(8)
weights = fitness_function(target,population)
parents = select_parents(population,weights)
children = crossover_reproduction(parents,population)
The output will be as follows:
Crossover point: 3
Next, append population with children:
population_crossover = np.append(population, children, axis= 0)
print(' Population after the cross-over: ',
population_crossover)
The population will be as follows:
Population after the cross-over:
[[0. 1. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 1. 0. 0.]
[1. 1. 1. 1. 1. 0. 0. 1.]
[1. 1. 1. 0. 1. 1. 1. 1.]
[0. 1. 1. 1. 1. 0. 0. 0.]
[1. 1. 1. 1. 1. 0. 0. 1.]
[1. 1. 1. 0. 1. 1. 1. 1.]]
Use the crossover population and the mutation probability of 0.05 to create a new population and display the mutated population:
mutation_probability = 0.05
new_population = mutate_population
(population_crossover,mutation_probability)
print(' Next generation of the population: ',
new_population)
As you can see, the threshold(mutation_probability) is 0.05. Hence, if the elements are higher than this threshold, they will incur a mutation (so there is a 95% chance of the mutation occurring to the gene).
The output will be as follows:
Next generation of the population:
[[1. 0. 1. 1. 1. 1. 0. 1.]
[1. 0. 1. 1. 1. 0. 1. 1.]
[1. 0. 0. 0. 0. 1. 1. 0.]
[0. 0. 0. 1. 0. 0. 0. 0.]
[1. 0. 0. 0. 1. 1. 1. 1.]
[0. 0. 0. 0. 0. 1. 1. 0.]
[0. 0. 0. 1. 0. 1. 0. 1.]]

You will get a similar output as the population elements are randomized. You can see that the chromosomes resulting from crossover are added to the original population and that after mutation, the population has the same number of chromosomes, but the genes are different. The crossover and mutation steps can be repeated until the target solution is reached by looping the functions. These cycles are also known as generations.

Note

To access the source code for this specific section, please refer to https://packt.live/3dXaBqi.

You can also run this example online at https://packt.live/2Ysc5Cl.

In this section, mutation was described. The benefit of mutation is that it introduces random variation to chromosomes, encouraging exploration and helping to avoid local optima. Various mutation techniques were presented. The example we used showed the impact of mutation probability by implementing reverse mutation on a population after the crossover process was finalized.

Application to Hyperparameter Selection

In this section, we will explore the use of GAs for parameter selection, especially when using neural networks. GAs are widely used for optimization problems in scheduling in both production and railway management. The solutions to these types of problems rely on creating a combination of neural networks and GAs as function optimizers.

The exercise in this section provides a platform for tuning hyperparameters for a neural network to predict wind flow patterns. You will apply a simple genetic algorithm to optimize the values of the hyperparameters used to train a neural network.

Artificial Neural Networks (ANNs) model the biological processes and structures of neurons in the brain. The neurons in ANNs rely on a combination of input information (parameters) and weights. The product (which has an added bias) passes through a transfer function, which is a set of neurons arranged in parallel with each other to form a layer.

For weight and bias optimization, ANNs use gradient descent methods for their training processes and backpropagation processes. This impacts the development of the neural network, as before training even commences, the neural network topology needs to be fully designed. Because the design is pre-set, some neurons may not be used in the training process, but they may still be active, therefore making them redundant. Additionally, neural networks using gradient methods can become stuck at a local optimum, and therefore need to rely on alternative methods to help them continue their processes, such as regularization, ridge regression, or lasso regression. ANNs are widely used in speech recognition, feature detection (whether for image, topology, or signal processing), and disease detection.

To prevent these problems and enhance the training of neural networks, GAs can be implemented. GAs are used for function optimization, while crossover and mutation techniques help with problem space exploration. Initially, GAs were used to optimize the weights and number of nodes of neural networks. For this, the chromosomes of the GA are encoded with possible variations of weights and nodes. The fitness function generated by the ANN relies on the mean squared error of the potential values and the exact values of the parameters.

However, research has expanded to implementations of Recurrent Neural Networks (RNNs) and combining them with RL, aiming towards multi-processor performance. An RNN is a type of ANN that produces outputs that are not only a result of the weighting process of the input but also of a vector containing previous input and outputs. This enables the neural network to maintain prior knowledge of previous training instances.

GAs serve in expanding the topology of the neural networks beyond weighting adjustments. One example is EDEN, whereby encoding is done within the chromosome and the architecture of the network, and the learning rate achieves high accuracy rates on multiple TensorFlow datasets. One of the most challenging problems in training neural networks is the quality of the features (or input hyperparameters) that are fed to the network. If the parameters are not appropriate, the mapping of inputs and outputs will be erroneous. Therefore, GAs can act as a wrapper alternative to the ANNs by optimizing the selection of features.

The following exercise will teach you how to apply a simple genetic algorithm to identify the optimal parameters (window size and number of units) for an RNN. The genetic algorithm implemented is using the deap package, through the eaSimple() function, which enables you to create, using toolbox-based code, a simple GA that includes population creation, selection through the selRandom() function, reproduction through the cxTwoPoint() function, and mutation through the mutFlipBit() function. For comparing and hyperparameter selection, the selBest() function is used.

Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training

Our goal in this exercise is to identify the best hyperparameters to use for an RNN using a simple genetic algorithm. In this exercise, we are using a dataset that was part of a weather forecasting challenge in 2012. A single feature, wp2, is used in the training and validation of the parameters. The two hyperparameters used are the number of units and the window size. These hyperparameters represent the genetic material for the chromosome:

Note

The dataset can be found in the GitHub repository at the following link: https://packt.live/2Ajjz2F.

The original dataset can be found at the following link: https://www.kaggle.com/c/GEF2012-wind-forecasting/data.

Create a new Jupyter Notebook. Import the pandas and numpy libraries and functions:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split as split
from tensorflow.keras.layers import SimpleRNN, Input, Dense
from tensorflow.keras.models import Model
from deap import base, creator, tools, algorithms
from scipy.stats import bernoulli
from bitstring import BitArray
From the sklearn package, import mean_squared_error and train_test_split. Also, from the tensorflow and keras packages, import SimpleRNN, Input, Dense (from the layers folder), and the model (from the Model class). To create the GA, it is necessary to call from the deap package base, creator, tools, and algorithms. For statistics, we are using the Bernoulli equation; therefore, we will call bernoulli from the scipy.stats package. From bitstrings, we will call BitArray.
Use a random seed for model development; 998 is an initialization number for the seed:
np.random.seed(998)
Load data from the train.csv file, use np.reshape() to modify the data into an array that only contains column wp2, and select the first 1,501 elements:
#read data from csv
data = pd.read_csv('../Dataset/train.csv')
#use column wp2
data = np.reshape(np.array(data['wp2']), (len(data['wp2']), 1))
data = data[0:1500]
Define a function to split the dataset based on window size:
def format_dataset(data, w_size):
    #initialize as empty array
    X, Y = np.empty((0, w_size)), np.empty(0)
    """
    depending on the window size the data is separated in
    2 arrays containing each of the sizes
    """
    for i in range(len(data)-w_size-1):
        X = np.vstack([X,data[i:(i+w_size),0]])
        Y = np.append(Y, data[i+w_size,0])
    X = np.reshape(X,(len(X),w_size,1))
    Y = np.reshape(Y,(len(Y), 1))
    return X, Y
Define a function to train the RNN to identify the optimal hyperparameters using a simple genetic algorithm:
def training_hyperparameters(ga_optimization):
    """
    decode GA solution to integer window size and number of units
    """
    w_size_bit = BitArray(ga_optimization[0:6])
    n_units_bit = BitArray(ga_optimization[6:])
    w_size = w_size_bit.uint
    n_units = n_units_bit.uint
    print(' Window Size: ', w_size,
          ' Number of units: ',n_units)
    """
    return fitness score of 100 if the size or the units are 0
    """
    if w_size == 0 or n_units == 0:
        return 100
    """
    segment train data on the window size splitting it into
    90 train, 10 validation
    """
    X,Y = format_dataset(data, w_size)
    X_train, X_validate, Y_train, Y_validate =
    split(X, Y, test_size= 0.10, random_state= 998)
The first step is identifying the sections of the chromosome pertaining to window size and the number of units. The next step is to return an extremely high fitness score, if there are no window sizes or the number of units. Split the two arrays into training and validation arrays with a 90:10 split.
Initialize the input features, and use the SimpleRNN model with the training dataset. For optimization, use the Adam algorithm with mean squared error as the loss function. To train the model, use the fit function with 5 for epochs and a batch size of 4. To generate the predicted values, use the input values stored in X_validate in the predict function for the model. Calculate the Root Mean Squared Error (RMSE) between the validation set and predicted set of output variables. Return RMSE:
    input_features = Input(shape=(w_size,1))
    x = SimpleRNN(n_units,input_shape=(w_size,1))(input_features)
    output = Dense(1, activation='linear')(x)
    rnnmodel = Model(inputs=input_features, outputs = output)
    rnnmodel.compile(optimizer='adam',
                     loss = 'mean_squared_error')
    rnnmodel.fit(X_train, Y_train, epochs=5,
                 batch_size=4, shuffle = True)
    Y_predict = rnnmodel.predict(X_validate)
    # calculate RMSE score as fitness score for GA
    RMSE = np.sqrt(mean_squared_error(Y_validate, Y_predict))
    print('Validation RMSE: ', RMSE, ' ')
    return RMSE,
Instantiate the population size, the number of generations used for the genetic algorithm, and the length of the gene with 4, 5, and 10, respectively:
population_size = 4
generations = 5
gene = 10
Use the toolbox available in the deap package to instantiate the genetic algorithm, eaSimple(). To do this, use the creator tool to instantiate the fitness function as RMSE:
creator.create('FitnessMax', base.Fitness, weights= (-1.0,))
creator.create('Individual', list, fitness = creator.FitnessMax)
toolbox = base.Toolbox()
toolbox.register('bernoulli', bernoulli.rvs, 0.5)
toolbox.register('chromosome', tools.initRepeat,
                 creator.Individual, toolbox.bernoulli, n = gene)
toolbox.register('population', tools.initRepeat,
                 list, toolbox.chromosome)
toolbox.register('mate', tools.cxTwoPoint)
toolbox.register('mutate', tools.mutFlipBit, indpb = 0.6)
toolbox.register('select', tools.selRandom)
toolbox.register('evaluate', training_hyperparameters)
population = toolbox.population(n = population_size)
algo = algorithms.eaSimple(population,toolbox,cxpb=0.4,
                           mutpb=0.1, ngen=generations,
                           verbose=False)
The last few lines of the output will be as follows:
Window Size: 48
Number of units: 15
Train on 1305 samples
Epoch 1/5
1305/1305 [==============================] - 3s 2ms/sample
- loss: 0.0106
Epoch 2/5
1305/1305 [==============================] - 3s 2ms/sample
- loss: 0.0066
Epoch 3/5
1305/1305 [==============================] - 3s 2ms/sample
- loss: 0.0057
Epoch 4/5
1305/1305 [==============================] - 3s 2ms/sample
- loss: 0.0051
Epoch 5/5
1305/1305 [==============================] - 3s 2ms/sample
- loss: 0.0049
Validation RMSE: 0.05564985152918074
The lower the RMSE value, the better the hyperparameters. The Bernoulli distribution serves to randomly initialize the chromosome genes. Based on the chromosome, the population is initialized. Within the toolbox, there are four steps for creating a new population: mate (this refers to the crossover process: cxTwoPoint() refers to the parents crossing information at two points in a crossover), mutate (this refers to the mutation process: mutFlipBit() will only mutate one of the elements of the chromosome with a 0.6 probability of occurrence), select (selection of the parents happens randomly through the selRandom() function), evaluate (this uses the RNN training function from Step 6 and Step 7).
Use the selBest() function for a single optimal solution, k=1, compare the solutions to the fitness function, and select the one with the highest similarity. To get the optimal window size and number of units, loop through the chromosome and convert the bit values to unsigned integers and print the optimal hyperparameters:
optimal_chromosome = tools.selBest(population, k = 1)
optimal_w_size = None
optimal_n_units = None
for op in optimal_chromosome:
    w_size_bit = BitArray(op[0:6])
    n_units_bit = BitArray(op[6:])
    optimal_w_size = w_size_bit.uint
    optimal_n_units = n_units_bit.uint
    print(' Optimal window size:', optimal_w_size,
          ' Optimal number of units:', optimal_n_units)
The output will be as follows:
Optimal window size: 48
Optimal number of units: 15
Run the application. You will get a similar output to what you see here. The initial values for the window size and number of units will be displayed. The GA will run using the RNN for the total number of epochs. At the end of each epoch, the RMSE value is displayed. Once all the epochs have executed, the optimal values are displayed:

Figure 12.10: Optimization of the window size and number of units using GA

We started with an initial window size of 51 and 15 units; the optimal window size is reduced to 28, and the number of units to 4. The difference between the parameters based on RMSE is reduced to 0.05.

Note

To access the source code for this specific section, please refer to https://packt.live/37sgQA6.

You can also run this example online at https://packt.live/30AOKRK.

This section has covered combining GAs with neural networks as an alternative to using gradient descent methods. GAs mainly served in optimizing the number of neurons and weights for the neural networks, but their use can be expanded, through hybridization, to optimizing the structure of the network and hyperparameter selection. This exercise tested your ability to apply a genetic algorithm to find the optimal values of two features related to a weather forecasting problem. The features were used to train an RNN to estimate wind flow using RMSE values. In the following section, you will expand your knowledge of hybrid optimization techniques for the entire architecture of a neural network using NEAT.

NEAT and Other Formulations

Neuroevolution is a term that refers to evolving neural networks using GAs. This branch of machine learning is shown to outperform RL in various problems and can be coupled with RL, as it is a method for unsupervised learning. As mentioned in the previous section, neuroevolution systems concentrate on changing the weights, the number of neurons (in the hidden layers), and the topology of ANNs.

Neuroevolution of Augmented Topologies (NEAT) focuses on topology evolution for ANNs. It involves training a simple ANN structure, consisting of input and output neurons and units to represent the bias, but no hidden layers. Each ANN structure is encoded within a chromosome that contains node genes and connection genes (the mapping or link between two node genes). Each connection specifies the input, output, weight node, activation of the connection, and innovation number that serves as a link between genes for the crossover process.

Mutations relate to the weights of the connections or the structure of the full system. Structural mutations can appear either by including a connection between two nodes that are not linked or by including a new node to a pre-existing connection, which causes two new connections to be built (one between the existing pair of nodes and one that includes the newly created node).

The crossover process entails the identification of common genes between different chromosomes within the population. This relies on the historical information about gene derivation, using a global innovation number. The genes resulting from the mutation receive incremented numbers from the gene they mutated, whereas through crossover, the genes keep their original numbers. This technique helps in solving the problems with gene matching that cause issues for neural network topologies. The genes that do not have the same innovation number are selected from the parent with the highest fitness. If both parents have the same fitness, the genes are selected randomly from each of the parents.

Chromosomes that have similar topologies are grouped based on how far apart they are ; individuals are therefore evaluated based on the genes that are different , supplementary genes , and the differences in weight for the similar genes compared to the average number of genes . Each of the coefficients acts as a weight that highlights the significance of each parameter:

Figure 12.11: Topology distance calculation

To categorize the chromosomes into species, the distance is compared with a threshold . If , then the chromosome belongs to the first species where this condition is fulfilled. To prevent species dominance, all the elements of the species need to have the same fitness level, which is calculated based on the number of members in the species. The evolution of the species (how many new chromosomes are included, depends on the comparison between the fitness of the species, , and the average fitness of the population, :

Figure 12.12: Calculation of the number of new chromosomes

The advantage of NEAT is that, unlike neuroevolution algorithms that have a random set of topology parameters, it starts with the simplest topological form of a neural network and progressively evolves it to find the optimal solution, significantly reducing the number of used generations.

Evolving topology algorithms are categorized as Weight Evolving Artificial Neural Networks (TWEANNs), which include EDEN, Cellular Encoding (CE), Enforced Subpopulations (SE) – a fixed topology system (out of which NEAT outperforms the latter two on CartPole) – Parallel Distributed Genetic Programming (PDGP), and Generalized Acquisition of Recurrent Links (GNARL).

We will now see an exercise on applying NEAT to solve a simple XNOR gate, a logic gate that has a binary output. The binary inputs and output are quantified using a truth table, which is a representation of the sets of the functional values of Boolean logic expressions showcasing the combination of the logical values.

Exercise 12.08: XNOR Gate Functionality Using NEAT

In the exercise, you will see the impact that NEAT has on solving a simple Boolean algebra problem. The problem involves implementing the NEAT algorithm to identify the optimal neural network topology for reproducing the binary output of an exclusive NOR (XNOR) gate. This is a type of logic gate where, when both inputs have the same signal (either 0 or 1 – equivalent to off and on, respectively), the output of the logic gate will be 1 (on), whereas when one of the inputs is high (1) and the other is low (0), the output will be 0 (off).

We have the following truth table for the XNOR logic gate:

Figure 12.13: Truth table for the XNOR gate

Use the NEAT algorithm to create a feedforward neural network that can mimic the output of an XNOR gate.

Perform the following steps to complete the exercise:

In your Anaconda environment, execute the following command:
conda install neat
Create a new Jupyter Notebook.
Import print_function from the __future__ file, and import the neat and os packages:
from __future__ import print_function
import os
import neat
Initialize the inputs and the output of the XNOR gate based on the truth table:
xnor_inputs = [(0.0, 0.0), (0.0, 1.0), (1.0, 0.0), (1.0, 1.0)]
xnor_output = [(1.0,),(0.0,),(0.0,),(1.0,)]
Create a fitness function that uses the squared difference between the actual output and the output of a feedforward neural network using NEAT:
def fitness_function(chromosomes, configuration):
    for ch_id, chromosome in chromosomes:
        chromosome.fitness = 4.0
        neural_net = neat.nn.FeedForwardNetwork.create
                     (chromosome, configuration)
        for xnor_i,xnor_o in zip(xnor_inputs, xnor_output):
            output = neural_net.activate(xnor_i)
            squared_diff = (output[0] - xnor_o[0])**2
            chromosome.fitness -= squared_diff
Create a new text file with the name config-feedforward-xnor. Include in the file the following parameters for the NEAT algorithm. For the fitness function, select the maximal value, with a threshold close to 4 and a population size of 200:
[NEAT]
fitness_criterion = max
fitness_threshold = 3.9
pop_size = 200
reset_on_extinction = False
In the same config-feedforward-xnor file, include the sigmoid function for node activation with a mutation rate of 0.01. The aggregation options are mostly about adding the values, with a mutation rate of 0 for aggregation:
[DefaultGenome]
# activation options of the nodes
activation_default = sigmoid
activation_mutate_rate = 0.01
activation_options = sigmoid
# aggregation options for the node
aggregation_default = sum
aggregation_mutate_rate = 0.0
aggregation_options = sum
Set the bias parameters for the algorithm:
# bias options for the node
bias_init_mean = 0.0
bias_init_stdev = 0.05
bias_max_value = 30.0
bias_min_value = -30.0
bias_mutate_power = 0.5
bias_mutate_rate = 0.8
bias_replace_rate = 0.1
For the bias, the minimum and maximum values are -30 and 30. Set the initial standard deviation at 0.05, as low as possible, with a power of 0.5, a mutation rate of 0.8, and a replacement rate of 0.1. These values are essential for implementing the genetic algorithm optimization.
Define the coefficients , as we are only considering the difference between the genes (how disjointed they are) and the difference in weights:
# compatibility options for the genes in the chromosome
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient = 0.5
Include the information about topology, connection, and node inclusion or removal-related parameters:
# add/remove rates for connections between nodes
conn_add_prob = 0.5
conn_delete_prob = 0.5
# connection enable options
enabled_default = True
enabled_mutate_rate = 0.01
feed_forward = True
initial_connection = full
# add/remove rates for nodes
node_add_prob = 0.2
node_delete_prob = 0.2
Start with a simple network without any hidden layers and set the response parameters for the nodes and connections:
# network parameters
num_hidden = 0
num_inputs = 2
num_outputs = 1
# node response options
response_init_mean = 1.0
response_init_stdev = 0.0
response_max_value = 30.0
response_min_value = -30.0
response_mutate_power = 0.0
response_mutate_rate = 0.0
response_replace_rate = 0.0
# connection weight options
weight_init_mean = 0.0
weight_init_stdev = 1.0
weight_max_value = 30
weight_min_value = -30
weight_mutate_power = 0.5
weight_mutate_rate = 0.9
weight_replace_rate = 0.15
Select the default parameters for the distance threshold, species fitness function, and parent selection. This is the final set of parameters to be included in the config-feedforward-xnor file:
[DefaultSpeciesSet]
compatibility_threshold = 3.0
[DefaultStagnation]
species_fitness_func = max
max_stagnation = 20
species_elitism = 2
[DefaultReproduction]
Elitism = 2
survival_threshold = 0.2
Now, in the main code file, use the config-feedforward-xnor file to configure the NEAT formulation of the neural network and output each configuration of the network within Exercise 12.08:
#load configuration
configuration = neat.Config(neat.DefaultGenome,
                            neat.DefaultReproduction,
                            neat.DefaultSpeciesSet,
                            neat.DefaultStagnation,
                            "../Dataset/config-feedforward-xnor")
print("Output of file configuration:", configuration)
The output will be as follows:
Output of file configuration: <neat.config.Config object at
0x0000017618944AC8>
Get the population based on the configuration of the NEAT algorithm and include the progress to the terminal to monitor the statistical differences:
#load the population size
pop = neat.Population(configuration)
#add output for progress in terminal
pop.add_reporter(neat.StdOutReporter(True))
statistics = neat.StatisticsReporter()
pop.add_reporter(statistics)
pop.add_reporter(neat.Checkpointer(5))
Run the algorithm for 200 generations and select the best solution for the neural network topology:
#run for 200 generations using
best = pop.run(fitness_function, 200)
#display the best chromosome
print(' Best chromosome: {!s}'.format(best))
The output will be similar to the following:
****** Running generation 0 ******
Population's average fitness: 2.45675 stdev: 0.36807
Best fitness: 2.99412 - size: (1, 2) - species 1 - id 28
Average adjusted fitness: 0.585
Mean genetic distance 0.949, standard deviation 0.386
Population of 200 members in 1 species:
ID age size fitness adj fit stag
==== === ==== ======= ======= ====
1 0 200 3.0 0.585 0
Total extinctions: 0
Generation time: 0.030 sec
****** Running generation 1 ******
Population's average fitness: 2.42136 stdev: 0.28774
Best fitness: 2.99412 - size: (1, 2) - species 1 - id 28
Average adjusted fitness: 0.589
Mean genetic distance 1.074, standard deviation 0.462
Population of 200 members in 1 species:
ID age size fitness adj fit stag
==== === ==== ======= ======= ====
1 1 200 3.0 0.589 1
Total extinctions: 0
Generation time: 0.032 sec (0.031 average)
Use functions to compare the output of the neural network with the desired output:
#show output of the most fit chromosome against the data
print(' Output:')
best_network = neat.nn.FeedForwardNetwork.create
               (best, configuration)
for xnor_i, xnor_o in zip(xnor_inputs, xnor_output):
    output = best_network.activate(xnor_i)
    print("input{!r}, expected output {!r}, got: {:.1f}"
          .format(xnor_i,xnor_o,output[0]))
The output will be as follows:
Output:
input(0.0, 0.0), expected output (1.0,), got: 0.9
input(0.0, 1.0), expected output (0.0,), got: 0.0
input(1.0, 0.0), expected output (0.0,), got: 0.2
input(1.0, 1.0), expected output (1.0,), got: 0.9
Run the code and you will get a similar output to what you see here. As the chromosomes are populated randomly, the algorithm will converge to a nearly optimal solution in a different number of generations for you:
****** Running generation 41 ******
Population's average fitness: 2.50036 stdev: 0.52561
Best fitness: 3.97351 - size: (8, 16) - species 2 - id 8095
Best individual in generation 41 meets fitness threshold
- complexity: (8, 16)
Best chromosome:
Key: 8095
Fitness: 3.9735119749933214
Nodes:
    0 DefaultNodeGene(key=0, bias=-0.02623087593563278,
                      response=1.0, activation=sigmoid,
                      aggregation=sum)
    107 DefaultNodeGene(key=107, bias=-1.5209385195946818,
                        response=1.0, activation=sigmoid,
                        aggregation=sum)[…]

Connections:
    DefaultConnectionGene(key=(-2, 107),
                          weight=1.8280370376000628,
                          enabled=True)
    DefaultConnectionGene(key=(-2, 128),
                          weight=0.08641968818530771,
                          enabled=True)
    DefaultConnectionGene(key=(-2, 321),
                          weight=1.2366021868005421,
                          enabled=True)[…]

By running this experiment, you can see that the conversion to a nearly optimal solution happened in less than the maximum number of generations (200). The output of the feedforward neural network is nearly optimal, as the values are integers. Their values are close to 1 and 0. You can also observe that from a neural network with no hidden layers, the ANN has evolved to have 1149 nodes with various connections.

Note

To access the source code for this specific section, please refer to https://packt.live/2XTBs0M.

This section does not currently have an online interactive example, and will need to be run locally.

In this section, the NEAT algorithm, a neuroevolution algorithm that varies the topology of neural networks, was presented. What sets the NEAT algorithm apart from alternative TWEANNs is the way in which mutation, crossover, and selection take place to optimize the structure of the neural network, starting from a simple network with no hidden layers and evolving into a more complex one with an increased number of nodes and connections.

This exercise, which involved implementing NEAT to reproduce the output of an XNOR logic gate, enabled you to understand the structure of the NEAT algorithm and analyze the benefits and implications of applying neuroevolutionary techniques as alternatives to simple electronic problems. In the next section, you will test your programming abilities and your knowledge of GAs by solving the cart-pole problem.

Activity 12.01: Cart-Pole Activity

Automatic control is a challenge, especially when operating specific equipment using robotic arms or carts that are transporting equipment on a shop floor. This problem is often generalized as the cart-pole problem. You are going to program an automated cart to balance a pole. The goal is to maximize the time that the pole is balanced for. To solve this problem, an agent can use a neural network for the state-action mapping. The challenge lies in identifying the structure of the neural network and a solution for determining the optimal values for the weights, bias, and number of neurons for each layer of the neural network. We will use a GA to identify the best values for these parameters.

This activity aims to implement a GA for parameter selection for an ANN that, after 20 generations, can obtain a high average score for 500 trials. You will output the average scores for both the generations and the episodes, and you will monitor the convergence to an optimal policy by tuning the parameters of the neural network using a genetic algorithm in the form of a graph. This activity has the purpose of testing your programming abilities by implementing concepts from previous chapters and the current one. The following are the steps needed to implement this activity:

Create a Jupyter Notebook file and import the appropriate packages as follows:
import gym
import numpy as np
import math
import tensorflow as tf
from matplotlib import pyplot as plt
from random import randint
from statistics import median, mean
Initialize the environment and the state and action space shapes.
Create a function to generate randomly selected initial network parameters.
Create a function to generate the neural network using the set of parameters.
Create a function to get the total reward for 300 steps when using the neural network.
Create a function to get the fitness scores for each element of the population when running the initial random selection.
Create a mutation function.
Create a single-point crossover function.
Create a function for the next-generation creation by selecting the pair with the highest rewards.
Select the parameters within the function to construct the neural network that adds the parameters.
Build the neural network using the identified parameters and obtain a new reward based on the constructed neural network.
Create a function to output the convergence graph.
Create a function for the genetic algorithm that outputs the parameters of the neural network based on the highest average reward.
Create a function that decodes the array of parameters to each neural network parameter.
Set the generations to 50, the number of trial tests to 15, and the number of steps and trials to 500. You will get a similar output to this (only the first few lines are displayed here):
Generation:1, max reward:11.0
Generation:2, max reward:11.0
Generation:3, max reward:10.0
Generation:4, max reward:10.0
Generation:5, max reward:11.0
Generation:6, max reward:10.0
Generation:7, max reward:10.0
Generation:8, max reward:10.0
Generation:9, max reward:11.0
Generation:10, max reward:10.0
Generation:11, max reward:10.0
Generation:12, max reward:10.0
Generation:13, max reward:10.0
Generation:14, max reward:10.0
Generation:15, max reward:10.0
Generation:16, max reward:10.0
Generation:17, max reward:10.0
Generation:18, max reward:10.0
Generation:19, max reward:11.0
Generation:20, max reward:11.0
The plot of rewards against generations will be similar to the following:

Figure 12.14: Rewards obtained over the generations

The output for the average rewards (just the last few lines are shown here) will be similar to the following:

Trial:486, total reward:8.0

Trial:487, total reward:9.0

Trial:488, total reward:10.0

Trial:489, total reward:10.0

Trial:490, total reward:8.0

Trial:491, total reward:9.0

Trial:492, total reward:9.0

Trial:493, total reward:10.0

Trial:494, total reward:10.0

Trial:495, total reward:9.0

Trial:496, total reward:10.0

Trial:497, total reward:9.0

Trial:498, total reward:10.0

Trial:499, total reward:9.0

Average reward: 9.384

Note

The solution to this activity can be found on page 774.

Summary

In this chapter, you have explored gradient-based and gradient-free methods of algorithm optimization, with an emphasis on the potential of evolutionary algorithms – in particular, GAs – to solve optimization problems, such as sub-optimal solutions, using a nature-inspired approach. GAs consist of specific elements, such as population generation, parent selection, parent reproduction or crossover, and finally mutation occurrence, which they use to create a binary optimal solution.

Then, the use of GAs for hyperparameter tuning and selection for neural networks was explored, helping us to find the most suitable window size and unit number. We saw implementations of state-of-the-art algorithms that combined deep neural networks and evolutionary strategies, such as NEAT for XNOR output estimation. Finally, you had a chance to implement what was studied in this chapter through an OpenAI Gym cart-pole simulation, where we examined the application of GAs for parameter tuning with action selection using a deep neural network.

The development of hybrid methods in RL systems is one of the most recent optimization developments. You have developed and implemented optimization methods for model-free RL systems. In the bonus chapter (which is available on the interactive version of the workshop at courses.packtpub.com), you will be exploring model-based RL methods and state-of-the-art advances in deep RL for control systems that can be applied in the robotics, manufacturing, and transportation fields.

You are now capable of applying the concepts that you learned about in this book using various coding techniques and various models that can help further enhance your field of expertise and potentially bring new changes and advancements. Your journey has just begun – you have taken the first steps to deciphering the world of RL, and you now have the tools to enhance your Python programming skills for RL, all of which you can independently apply.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Evolutionary Strategies for RL

Create new playlist

Sign In

Sign Up

12. Evolutionary Strategies for RL

Introduction

Problems with Gradient-Based Methods

Exercise 12.01: Optimization Using Stochastic Gradient Descent

Introduction to Genetic Algorithms

Exercise 12.02: Implementing Fixed-Value and Uniform Distribution Optimization Using GAs

Components: Population Creation

Exercise 12.03: Population Creation

Components: Parent Selection

Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques

Components: Crossover Application

Exercise 12.05: Crossover for a New Generation

Components: Population Mutation

Exercise 12.06: New Generation Development Using Mutation

Application to Hyperparameter Selection

Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training

NEAT and Other Formulations

Exercise 12.08: XNOR Gate Functionality Using NEAT

Activity 12.01: Cart-Pole Activity

Summary

Table of Contents for
12. Evolutionary Strategies for RL