Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Learning classifier systems

J. Holland introduced the concept of learning classifier systems (LCS) more than 30 years ago as an extension to evolutionary computing [11:10].

Learning classifier systems are a kind of rule-based system with general mechanisms for processing rules in parallel, for adaptive generation of new rules, and for testing the effectiveness of new rules.

However, the concept started to get the attention of computer scientists only a few years ago, with the introduction of several variants of the original concept, including extended learning classifiers (XCS). Learning classifier systems are interesting because they combine rules, reinforcement learning, and genetic algorithms.

Note

Disclaimer

The implementation of the extended learning classifier is presented for informational purposes only. Validating XCS against a known and labeled population of rules is a very significant endeavor. The source code snippet is presented only to illustrate the different components of the XCS algorithm.

Introduction to LCS

Learning classifier systems merge the concepts of reinforcement learning, rule-based policies, and evolutionary computing. This unique class of learning algorithms represents the merger of the following research fields [11:11]:

Reinforcement learning
Genetic algorithms and evolutionary computing
Supervised learning
Rule-based knowledge encoding

Let's take a look at the following diagram:

A diagram of the scientific disciplines required for learning classifier systems

Learning classifier systems are an example of complex adaptive systems. A learning classifier system has the following four components:

A population of classifiers or rules: This evolves over time. In some cases, a domain expert creates a primitive set of rules (core knowledge). In other cases, the rules are randomly generated prior to the execution of the learning classifier system.
A genetic algorithm-based discovery engine: This generates new classifiers or rules from the existing population. This component is also known as the rules discovery module. The rules rely on the same pattern of evolution of organisms introduced in the previous chapter. The rules are encoded as strings or bit strings to represent a condition (predicate) and action.
A performance or evaluation function: This measures the positive or negative impact of the actions from the fittest classifiers or policies.
A reinforcement learning component: This rewards or punishes the classifiers that contribute to the action, as seen in the previous section. The rules that contribute to an action that improves the performance of the system are rewarded, while those that degrade the performance of the system are punished. This component is also known as the credit assignment module.

Why LCS?

Learning classifier systems are particularly appropriate to problems in which the environment is constantly changing and are the combinations of a learning strategy and an evolutionary approach to build and maintain a knowledge base [11:12].

Supervised learning methods alone can be effective on large datasets, but they require either a significant amount of labeled data or a reduced set of features to avoid overfitting. Such constraints may not be practical in the case of ever-changing environments.

The last 20 years have seen the introduction of many variants of learning classifier systems that belong to the following two categories:

Systems for which accuracy is computed from the correct predictions and that apply the discovery to a subset of those correct classes. They incorporate elements of supervised learning to constrain the population of classifiers. These systems are known to follow the Pittsburgh approach.
Systems that explore all the classifiers and apply rule accuracy to the genetic selection of the rules. Each individual classifier is a rule. These systems are known to follow the Michigan approach.

The rest of this section is dedicated to the second type of learning classifiers—more specifically, extended learning classifier systems. In a context of LCS, the term classifier refers to the predicate or rule generated by the system. From this point on, the term rule replaces the term classifier to avoid confusion with the more common definition of classification.

Terminology

Each domain of research has its own terminology and LCS is no exception. The terminology of LCS consists of the following terms:

Environment: These are the environment variables in the context of reinforcement learning.
Agent: An agent used in reinforcement learning.
Predicate: A clause or fact using the format, variable-operator-value, and usually implemented as (operator, variable value); for example, Temperature-exceeds-87F or ('Temperature', 87F), Hard drive–failed or ('Status hard drive', FAILED), and so on. It is encoded as a gene in order to be processed by the genetic algorithm.
Compound predicate: This is the composition of several predicates and Boolean logic operators, which is usually implemented as a logical tree (for example, ((predicate1 AND predicate2) OR predicate3 is implemented as OR (AND (predicated 1, predicate 2), predicate3). It uses a chromosome representation.
Action: This is a mechanism that alters the environment by modifying the value of one or several of its parameters using a format (type of action, target); for example, change thermostat settings, replace hard drive, and so on.
Rule: This is a formal first-order logic formula using the format IF compound predicate THEN sequence of action; for example, IF gold price < $1140 THEN sell stock of oil and gas producing companies.
Classifier: This is a rule in the context of an LCS.
Rule fitness or score: This is identical to the definition of the fitness or score in the genetic algorithm. In the context of an LCS, it is the probability of a rule to be invoked and fired in response to the change in environment.
Sensors: These are environment variables monitored by an agent; for example, the temperature and hard drive status.
Input data stream: This is the flow of data generated by sensors. It is usually associated with online training.
Rule matching: This is a mechanism to match a predicate or compound predicate with a sensor.
Covering: This is the process of creating new rules to match a new condition (sensor) in the environment. It generates the rules by either using a random generator or mutating existing rules.
Predictor: This is an algorithm to find the action with the maximum number of occurrences within a set of matching rules.

Extended learning classifier systems

Similar to reinforcement learning, the XCS algorithm has an exploration phase and an exploitation phase. The exploitation process consists of leveraging the existing rules to influence the target environment in a profitable or rewarding manner:

The exploitation component of the XCS algorithm

The following list describes each numbered block:

Sensors acquire new data or events from the system.
Rules for which the condition matches the input event are extracted from the current population.
A new rule is created if no match is found in the existing population. This process is known as covering.
The chosen rules are ranked by their fitness values, and the rules with the highest predicted outcome are used to trigger the action.

The purpose of exploration components is to increase the rule base as a population of the chromosomes that encode these rules.

Exploration components of the XCS algorithm

The following list describes each numbered block of the block diagram:

Once the action is performed, the system rewards the rules for which the action has been executed. The reinforcement learning module assigns credit to these rules.
Rewards are used to update the rule fitness, applying evolutionary constraints to the existing population.
The genetic algorithm updates the existing population of classifiers/rules using operators such as crossover and mutation.

XCS components

This section describes the key classes of the XCS. The implementation leverages the existing design of the genetic algorithm and the reinforcement learning. It is easier to understand the inner workings of the XCS algorithm with a concrete application.

Application to portfolio management

Portfolio management and trading have benefited from the application of extended learning classifiers [11:13]. The use case is the management of a portfolio of exchange-traded funds in an ever-changing financial environment. Contrary to stocks, exchange-traded funds are representative of an industry-specific group of stocks or the financial market at large. Therefore, the price of these ETFs is affected by the following macroeconomic changes:

Gross domestic product
Inflation
Geopolitical events
Interest rates

Let's select the value of the 10-year Treasury yield as a proxy for the macroeconomic conditions, for the sake of simplicity.

The portfolio has to be constantly adjusted in response to any specific change in the environment or market condition that affects the total value of the portfolio, and this can be done by referring to the following table:

XCS component	Portfolio management
Environment	This is the portfolio of securities defined by its composition, total value, and the yield of the 10-year Treasury bond
Action	This is the change in the composition of the portfolio
Reward	This is the profit and loss of the total value of the portfolio
Input data stream	This is the feed of the stock and bond price quotation
Sensor	This is the trading information regarding securities in the portfolio such as price, volume, volatility, yield, and the yield of the-10 year Treasury bond
Predicate	This is the change in the composition of the portfolio
Action	This rebalances a portfolio by buying and selling securities
Rule	This is the association of trading data with the rebalancing of a portfolio

The first step is to create an initial set of rules regarding the portfolio. This initial set can be created randomly, like the initial population of a genetic algorithm or defined by a domain expert.

Note

The XCS initial population

Rules or classifiers are defined and/or refined through evolution. Therefore, there is no absolute requirement for the domain expert to set up a comprehensive knowledge base. In fact, rules can be randomly generated at the start of the training phase. However, seeding the XCS initial population with a few relevant rules improves the odds of having the algorithm converge quickly.

You are invited to initialize the population of rules with as many relevant and financially sound trading rules as possible. Over time, the execution of the XCS algorithm will confirm whether or not the initial rules are indeed appropriate. The following diagram describes the application of the XCS algorithm to the composition of a portfolio of ETFs, such as VWO, TLT, IWC, and so on, with the following components:

The population of trading rules
An algorithm to match rules and compute the prediction
An algorithm to extract the actions sets
The Q-learning module to assign a credit or reward to the selected rules
The genetic algorithm to evolve the population of rules

Let's take a look at the following diagram:

An overview of the XCS algorithm to optimize the portfolio allocation

The agent responds to the change in the allocation of ETFs in the portfolio by matching one of the existing rules.

Let's build the XCS agent from the ground.

The XCS core data

There are three types of data that are manipulated by the XCS agent:

Signal: This is the trading signal.
XcsAction: This is the action on the environment. It subclasses a Gene defined in the genetic algorithm.
XcsSensor: This is the sensor or data from the environment.

The Gene class was introduced for the evaluation of the genetic algorithm in the Trading signals section in Chapter 10, Genetic Algorithms. The agent creates, modifies, and deletes actions. It makes sense to define these actions as mutable genes, as follows:

class XcsAction(sensorId: String, target: Double)
    (implicit quantize: Quantization, encoding: Encoding) //1
    extends Gene(sensorId, target, EQUAL)

The quantization and encoding of the XCSAction into a Gene has to be explicitly declared (line 1). The XcsAction class has the identifier of the sensorId sensor and the target value as parameters. For example, the action to increase the number of shares of ETF, VWO in the portfolio to 80 is defined as follows:

val vwoTo80 = new XcsAction("VWO", 80.0)

The only type of action allowed in this scheme is setting a value using the EQUAL operator. You can create actions that support other operators such as += used to increase an existing value. These operators need to implement the operator trait, as explained in the Trading operators section in Chapter 10, Genetic Algorithms.

Finally, the XcsSensor class encapsulates the sensorId identifier for the variable and value of the sensor, as shown here:

case class XcsSensor(val sensorId: String, val value: Double) 
val new10ytb = XcsSensor("10yTBYield", 2.76)

Note

Setters and getters

In this simplistic scenario, the sensors retrieve a new value from an environment variable. The action sets a new value to an environment variable. You can think of a sensor as a get method of an environment class and an action as a set method with variable/sensor ID and value as arguments.

XCS rules

The next step consists of defining a rule of the XcsRule type as a pair of two genes: a signal and an action, as shown in the following code:

class XcsRule(val signal: Signal, val action: XcsAction)

The rule: r1: IF(yield 10-year TB > 2.84%) THEN reduce VWO shares to 240 is implemented as follows:

val signal = new Signal("10ytb", 2.84, GREATER_THAN) 
val action = new XcsAction("vwo", 240) 
val r1 = new XcsRule(signal, action)

The agent encodes the rule as a chromosome using 2 bits to represent the operator and 32 bits for values, as shown in the following diagram:

In this implementation, there is no need to encode the type of action as the agent uses only one type of action—set. A complex action requires encoding of its type.

Note

Knowledge encoding

This example uses very simple rules with a single predicate as the condition. Real-world domain knowledge is usually encoded using complex rules with multiple clauses. It is highly recommended that you break down complex rules into multiple basic rules of classifiers.

Matching a rule to a new sensor consists of matching the sensor to the signal. The algorithm matches the new new10ytb sensor (line 2) against the signal in the current population of s10ytb1 (line 3) and s10ytb2 (line 4) rules that use the same sensor or the 10ytb variable as follows:

val new10ytb = new XcsSensor("10ytb", 2.76) //2
val s10ytb1 = Signal("10ytb", 2.5, GREATER_THAN)  //3
val s10ytb2 = Signal("10ytb", 2.2, LESS_THAN) //4

In this case, the agent selects the r23 rule but not r34 in the existing population. The agent then adds the act12 action to the list of possible actions. The agent lists all the rules that match the r23, r11, and r46 sensors, as shown in the following code:

val r23: XcsRule(s10yTB1, act12) //5
val r11: XcsRule(s10yTB6, act6) 
val r46: XcsRule(s10yTB7, act12) //6

The action with the most references, act12, (lines 5 and 6) is executed. The Q-learning algorithm computes the reward from the profit or loss incurred by the portfolio following the execution of the selected r23 and r46 rules. The agent uses the reward to adjust the fitness of r23 and r46, before the genetic selection in the next reproduction cycle. These two rules will reach and stay in the top tier of the rules in the population, until either a new genetic rule modified through crossover and mutation or a rule created through covering, triggers a more rewarding action on the environment.

Covering

The purpose of the covering phase is to generate new rules if no rule matches the input or sensor. The cover method of an XcsCover singleton generates a new XcsRule instance given a sensor and an existing set of actions, as shown here:

val MAX_NUM_ACTIONS = 2048

def cover(sensor: XcsSensor, actions: List[XcsAction])
   (implicit quant: Quantization, encoding: Encoding): 
   List[XcsRule] = 

  actions./:(List[XcsRule]()) ((xs, act) => {
    val signal = Signal(sensor.id, sensor.value, 
       new SOperator(Random.nextInt(Signal.numOperators)))
    new XcsRule(signal, XcsAction(act, Random)) :: xs
  })
}

You might wonder why the cover method uses a set of actions as arguments knowing that covering consists of creating new actions. The method mutates (^ operator) an existing action to create a new one instead of using a random generator. This is one of the advantages of defining an action as a gene. One of the constructors of XcsAction executes the mutation, as follows:

def apply(action: XcsAction, r: Random): XcsAction = 
   (action ^ r.nextInt(XCSACTION_SIZE))

The index of the operator r type is a random value in the interval [0, 3] because a signal uses four types of operators: None, >, <, and =.

An implementation example

The Xcs class has the following purposes:

gaSolver: This is the selection and generation of genetically modified rules
qlLearner: This is the rewarding and scoring the rules
Xcs: These are the rules for matching, covering, and generation of actions

The extended learning classifier is a data transformation of the ETransform type with an explicit configuration of the XcsConfig type (line 8) (refer to the Monadic data transformation section in Chapter 2, Hello World!):

class Xcs(config: XcsConfig, 
    population: Population[Signal], 
    score: Chromosome[Signal]=> Unit, 
    input: Array[QLInput])  //7
      extends ETransform[XcsConfig](config) { //8
  
  type U = XcsSensor   //9
  type V = List[XcsAction]   //10

   val solver = GASolver[Signal](config.gaConfig,score) 
   val features = population.chromosomes.toSeq
   val qLearner = QLearning[Chromosome[Signal]]( //11
      config.qlConfig, extractGoals(input), input, features)
   override def |> : PartialFunction[U, Try[V]]
   ...
}

The XCS algorithm is initialized with a configuration config, an initial set of rules population, a fitness function score, and an input to the Q-learning policy generate reward matrix for qlLearner (line 7). Being an explicit data transformation, the U type of an input element and the V type of the output element to the |> predictor are initialized as XcsSensor (line 9) and List[XcsAction] (line 10).

The goals and number of states are extracted from the input to the policy of the Q-learning algorithm.

In this implementation, the solver generic algorithm is mutable. It is instantiated along with the Xcs container class. The Q-learning algorithm uses the same design, as any classifier, as immutable. The model of Q-learning is the best possible policy to reward rules. Any changes in the number of states or the rewarding scheme require a new instance of the learner.

Benefits and limitations of learning classifier systems

Learning classifier systems and XCS in particular, hold many promises, which are listed as follows:

They allow nonscientists and domain experts to describe the knowledge using familiar Boolean constructs and inferences such as predicates and rules
They provide analysts with an overview of the knowledge base and its coverage by distinguishing between the need for exploration and exploitation of the knowledge base

However, the scientific community has been slow to recognize the merits of these techniques. The wider adoption of learning classifier systems is hindered by the following factors:

The large number of parameters used in both exploration and exploitation phases adds to the sheer complexity of the algorithm.
There are too many competitive variants of learning classifier systems
There is no clear unified theory to validate the concept of evolutionary policies or rules. After all, these algorithms are the merger of standalone techniques. The accuracy and performance of the execution of many variants of the learning classifier systems depend on each component as well as the interaction between components.
An execution that is not always predictable in terms of scalability and performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Learning classifier systems

Create new playlist

Sign In

Sign Up

Learning classifier systems

Note

Introduction to LCS

Why LCS?

Terminology

Extended learning classifier systems

XCS components

Application to portfolio management

Note

The XCS core data

Note

XCS rules

Note

Covering

An implementation example

Benefits and limitations of learning classifier systems

Table of Contents for
Learning classifier systems