Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Genetic algorithm components

Genetic algorithms have the following three components:

Genetic encoding (and decoding): This is the conversion of a solution candidate and its components into the binary format (an array of bits or a string of 0 and 1 characters)
Genetic operations: This is the application of a set of operators to extract the best (most genetically fit) candidates (chromosomes)
Genetic fitness functions: This is the evaluation of the fittest candidate using an objective function

Encodings and the fitness function are problem dependent. Genetic operators are not.

Encoding

Let's consider the optimization problem in machine learning that consists of maximizing the log likelihood or minimizing the loss function. The goal is to compute the parameters or weights, w={w_i}, that minimize or maximize a function f(w). In the case of a nonlinear model, variables may depend on other variables, which make the optimization problem particularly challenging.

Value encoding

The genetic algorithm manipulates variables as bits or bit strings. The conversion of a variable into a bit string is known as encoding. In the case where the variable is continuous, the conversion is known as quantization or discretization. Each type of variable has a unique encoding scheme, as follows:

Boolean values are easily encoded with 1 bit: 0 for false and 1 for true.
Continuous variables are quantized or discretized in a fashion similar to the conversion of an analog to a digital signal. Let's consider the function with a maximum max (similarly min for minimum) over a range of values, encoded with n = 16 bits:
An illustration of quantization of a continuous variable y = f(x)

The step size of the discretization is computed as (M1):

The step size of the quantization of the sine y = sin(x) in 16 bits is 1.524e-5.

Discrete or categorical variables are a bit more challenging to encode to bits. At a minimum, all the discrete values have to be accounted for. However, there is no guarantee that the number of variables will coincide with the bits boundary:

Padding for base 2 representation of values

In this case, the next exponent, n+1, defines the minimum number of bits required to represent the set of values: n = log2(m).toInt + 1. A discrete variable with 19 values requires 5 bits. The remaining bits are set to an arbitrary value (0, NaN, and so on) depending on the problem. This procedure is known as padding.

Encoding is as much art as it is science. For each encoding function, you need a decoding function to convert the bits representation back to actual values.

Predicate encoding

A predicate for a variable x is a relation defined as a x operator [target]; for instance, unit cost < [9$], temperature = [82F], or Movie rating is [3 stars].

The simplest encoding scheme for predicates is as follows:

Variables are encoded as a category or type (for example, temperature, barometric pressure, and so on) because there are a finite number of variables in any model
Operators are encoded as discrete types
Values are encoded as either discrete or continuous values

Note

Encoding format for predicates

There are many approaches for encoding a predicate in a bits string. For instance, the format {operator, left-operand, and right-operand} is useful because it allows you to encode a binary tree. The entire rule, IF predicate THEN action, can be encoded with the action being represented as a discrete or categorical value.

Solution encoding

The solution encoding approach describes the solution to a problem as an unordered sequence of predicates. Let's consider the following rule:

IF {Gold price rises to [1316$/ounce]} AND 
   {US$/Yen rate is [104]}).
THEN {S&P 500 index is [UP]}

In this example, the search space is defined by two levels:

Boolean operators (for example, AND) and predicates
Each predicate is defined as a tuple (a variable, operator, target value)

The tree representation for the search space is shown in the following diagram:

A graph representation of encoded rules

The bits string representation is decoded back to its original format for further computation:

Encoding, alteration, and decoding of predicates

The encoding scheme

There are two approaches to encode such a candidate solution or chain of predicates:

Flat coding of a chromosome
Hierarchical coding of a chromosome as a composition of genes

Flat encoding

The flat encoding approach consists of encoding the set of predicates into a single chromosome (bits string), representing a specific solution candidate to the optimization problem. The identity of the predicates is not preserved:

Flat addressing schema for chromosomes

A genetic operator manipulates the bits of the chromosome regardless of whether the bits refer to a particular predicate:

Chromosome encoding with flat addressing

Hierarchical encoding

In this configuration, the characteristic of each predicate is preserved during the encoding process. Each predicate is converted into a gene represented by a bit string. The genes are aggregated to form the chromosome. An extra field is added to the bits string or chromosome for the selection of the gene. This extra field consists of the index or the address of the gene:

Hierarchical addressing schema for chromosomes

A generic operator selects the predicate it needs to first manipulate. Once the target gene is selected, the operator updates the bits string associated with the gene, as follows:

Chromosome encoding with flat addressing

The next step is to define the genetic operators that manipulate or update the bits string representing either a chromosome or individual genes.

Genetic operators

The implementation of the reproduction cycle attempts to replicate the natural reproduction process [10:7]. The reproduction cycle that controls the population of chromosomes consists of three genetic operators:

Selection: This operator ranks chromosomes according to a fitness function or criteria. It eliminates the weakest or less-fit chromosomes and controls the population growth.
Crossover: This operator pairs chromosomes to generate offspring chromosomes. These offspring chromosomes are added to the population along with their parent chromosomes.
Mutation: This operator introduces a minor alteration in the genetic code (bits string representation) to prevent the successive reproduction cycles from electing the same fittest chromosome. In optimization terms, this operator reduces the risk of the genetic algorithm converging quickly toward a local maximum or minimum.

Note

The transposition operator

Some implementations of genetic algorithms use a fourth operator, genetic transposition, in case the fitness function cannot be very well defined and the initial population is very large. Although additional genetic operators could potentially reduce the odds of finding a local maximum or minimum, the inability to describe the fitness criteria or the search space is a sure sign that a genetic algorithm may not be the most suitable tool.

The following diagram gives an overview of the genetic algorithm workflow:

A basic workflow for the execution of genetic algorithms

Note

Initialization

The initialization of the search space (a set of potential solutions to a problem) in any optimization procedure is challenging and genetic algorithms are no exception. In the absence of biases or heuristics, the reproduction initializes the population with randomly generated chromosomes. However, it is worth the effort to extract the characteristics of a population. Any well-founded bias introduced during initialization facilitates the convergence of the reproduction process.

Each of these genetic operators has at least one configurable parameter that has to be estimated and/or tuned. Moreover, you will likely need to experiment with different fitness functions and encoding schemes in order to increase your odds of finding a fittest solution (or chromosome).

Selection

The purpose of the genetic selection phase is to evaluate, rank, and weed out the chromosomes (that is, the solution candidates) that are not a good fit for the problem. The selection procedure relies on a fitness function to score and rank candidate solutions through their chromosomal representation. It is a common practice to constrain the growth of the population of chromosomes by setting a limit to the size of the population.

There are several methodologies to implement the selection process from scaled relative fitness, Holland roulette wheel, and tournament selection to rank-based selection [10:8].

Note

Relative fitness degradation

As the initial population of chromosomes evolves, the chromosomes tend to get more and more similar to each other. This phenomenon is a healthy sign that the population is actually converging. However, for some problems, you may need to scale or magnify the relative fitness to preserve a meaningful difference in the fitness score between the chromosomes [10:9].

The following implementation relies on rank-based selection.

The selection process consists of the following steps:

Apply the fitness function to each chromosome j in the population f_j.
Compute the total fitness score for the entire population ∑f_j.
Normalize the fitness score of each chromosome by the sum of the fitness scores of all the chromosomes f_j = f_i/Σf_j.
Sort the chromosomes by their descending fitness score f_j < f_j-1.
Compute the cumulative fitness score for each chromosome j f_j = f_j + ∑f_k.
Generate the selection probability (for the rank-based formula) as a random value p ε [0,1].
Eliminate the chromosome k that has a low unfitness score f_k < p or high fitness cost f_k > p.
Reduce the size of the population if it exceeds the maximum allowed number of chromosomes.

Note

Natural selection

You should not be surprised by the need to control the size of the population of chromosomes. After all, nature does not allow any species to grow beyond a certain point in order to avoid depleting natural resources. The predator-prey process modeled by the Lotka-Volterra equation [10:10] keeps the population of each species in check.

Crossover

The purpose of the genetic crossover is to expand the current population of chromosomes in order to intensify the competition among the solution candidates. The crossover phase consists of reprogramming chromosomes from one generation to the next. There are many different variations of crossover techniques. The algorithm for the evolution of the population of chromosomes is independent of the crossover technique. Therefore, the case study uses the simpler one-point crossover. The crossover swaps sections of the two-parent chromosomes to produce two offspring chromosomes, as illustrated in the following diagram:

A chromosome's crossover operation

An important element in the crossover phase is selecting and pairing of parent chromosomes. There are different approaches for selecting and pairing the parent chromosomes that are the most suitable for reproduction:

Selecting only the n fittest chromosomes for reproduction
Pairing chromosomes ordered by their fitness (or unfitness) value
Pairing the fittest chromosome with the least-fit chromosome, the second fittest chromosome with the second least-fit chromosome, and so on

It is a common practice to rely on a specific optimization problem to select the most appropriate selection method as it is highly domain dependent.

The crossover phase that uses hierarchical addressing as the encoding scheme consists of the following steps:

Extract pairs of chromosomes from the population.
Generate a random probability p ϵ [0,1].
Compute the index r_i of the gene for which the crossover is applied as r_i = p.num_genes, where num_genes are the number of genes in a chromosome.
Compute the index of the bit in the selected gene for which the crossover is applied as x_i = p.gene_length, where gene_length is the number of bits in the gene.
Generate two offspring chromosomes by interchanging strands between parents.
Add the two offspring chromosomes to the population.

Note

Preserving parent chromosomes

You may wonder why the parents are not removed from the population once the offspring chromosomes are created. This is because there is no guarantee that any of the offspring chromosomes are a better fit.

Mutation

The objective of genetic mutation is to prevent the reproduction cycle from converging toward a local optimum by introducing a pseudo-random alteration to the genetic material. The mutation procedure inserts a small variation in a chromosome to maintain some level of diversity between generations. The methodology consists of flipping one bit in the bits string representation of the chromosome, as illustrated in the following diagram:

A chromosome's mutation operation

The mutation is the simplest of the three phases in the reproduction process. In the case of hierarchical addressing, the steps are as follows:

Select the chromosome to be mutated.
Generate a random probability p ϵ[0,1].
Compute the index m_i of the gene to be mutated using the formula m_i = p.num_genes.
Compute the index of the bit in the gene to be mutated x_i = p.genes_length.
Perform a flip XOR operation on the selected bit.

Note

The tuning issue

The tuning of a genetic algorithm can be a daunting task. A plan including a systematic design experiment for measuring the impact of the encoding, fitness function, crossover, and mutation ratio is necessary to avoid lengthy evaluation and self-doubt.

The fitness score

The fitness function is the centerpiece of the selection process. There are three categories of fitness functions, which are as follows:

The fixed fitness function: In this function, the computation of the fitness value does not vary during the reproduction process
The evolutionary fitness function: In this function, the computation of the fitness value morphs between each selection according to predefined criteria
An approximate fitness function: In this function, the fitness value cannot be computed directly using an analytical formula [10:11]

Our implementation of the genetic algorithm uses a fixed fitness function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Genetic algorithm components

Create new playlist

Sign In

Sign Up

Genetic algorithm components

Encoding

Value encoding

Predicate encoding

Note

Solution encoding

The encoding scheme

Flat encoding

Hierarchical encoding

Genetic operators

Note

Note

Selection

Note

Note

Crossover

Note

Mutation

Note

The fitness score

Table of Contents for
Genetic algorithm components