Learning classifier systems

J. Holland introduced the concept of learning classifier systems (LCS) more than 30 years ago as an extension to evolutionary computing [11:10]:

Learning classifier systems are a kind of rule-based system with general mechanisms for processing rules in parallel, for adaptive generation of new rules, and for testing the effectiveness of new rules.

However, the concept started to get the attention of computer scientists only a few years ago, with the introduction of several variants of the original concept, including extended learning classifier systems (XCS). Learning classifier systems are interesting because they combine rules, reinforcement learning, and genetic algorithms.

Note

Disclaimer

The implementation of the extended learning classifier is presented for informational purposes only. Validating XCS against a known and labeled population of rules is a very significant endeavor. The source code snippet is presented only to illustrate the different components of the XCS algorithm.

Introduction to LCS

Learning classifier systems merge the concepts of reinforcement learning, rule-based policies, and evolutionary computing. This unique class of learning algorithms represents the merger of the following research fields [11:11]:

  • Reinforcement learning
  • Genetic algorithms and evolutionary computing
  • Supervised learning
  • Rule-based knowledge encoding
    Introduction to LCS

    Diagram of the scientific disciplines required for learning classifier systems

Learning classifier systems are an example of complex adaptive systems. A learning classifier system has the following four components:

  • A population of classifiers or rules that evolves over time. In some cases, a domain expert creates a primitive set of rules (core knowledge). In other cases, the rules are randomly generated prior to the execution of the learning classifier system.
  • A genetic algorithm-based discovery engine that generates new classifiers or rules from the existing population. This component is also known as the rules discovery module. The rules rely on the same pattern of evolution of organisms introduced in the previous chapter. The rules are encoded as strings or bit strings to represent a condition (predicate) and action.
  • A performance or evaluation function that measures the positive or negative impact of the actions from the fittest classifiers or policies.
  • A reinforcement learning component that rewards or punishes the classifiers that contribute to the action, as seen in the previous section. The rules that contribute to an action that improves the performance of the system are rewarded, while those that degrade the performance of the system are punished. This component is also known as the credit assignment module.

Why LCS

Learning classifier systems are particularly appropriate to problems in which the environment is constantly changing, and are the combination of learning strategy and an evolutionary approach to build and maintain a knowledge base [11:12].

Supervised learning methods alone can be effective on large datasets, but they require either a significant amount of labeled data or a reduced set of features to avoid overfitting. Such constraints may not be practical in the case of ever-changing environments.

The last 20 years have seen the introduction of many variants of learning classifier systems that belong to the following two categories:

  • Systems for which accuracy is computed from the correct predictions and that apply the discovery to a subset of those correct classes. They incorporate elements of supervised learning to constrain the population of classifiers. These systems are known to follow the Pittsburgh approach.
  • Systems that explore all the classifiers and apply rule accuracy in the genetic selection of the rules. Each individual classifier is a rule. These systems are known to follow the Michigan approach.

The rest of this section is dedicated to the second type of learning classifiers—more specifically extended learning classifier systems. In a context of LCS, the term classifier refers to the predicate or rule generated by the system. From this point on, the term "rule" replaces the term classifier to avoid confusion with the more common definition of classification.

Terminology

Each domain of research has its own terminology and LCS is no exception. The terminology of LCS consists of the following terms:

  • Environment: Environment variables in the context of reinforcement learning.
  • Agent: An agent used in reinforcement learning.
  • Predicate: A clause or fact using the format: variable- operator- value, and usually implemented as (operator, variable value); for example, Temperature- exceeds - 87F or ('Temperature', 87F), Hard drive – failed or ('Status hard drive', FAILED), and so on. It is encoded as a gene in order to be processed by the genetic algorithm.
  • Compound predicate: Composition of several predicates and Boolean logic operators, which is usually implemented as a logical tree (for example, ((predicate1 AND predicate2) OR predicate3 is implemented as OR (AND (predicated 1, predicate 2), predicate3). It uses a chromosome representation.
  • Action: A mechanism that alters the environment by modifying the value of one or several of its parameters using a format (type of action, target), for example, change thermostat settings, replace hard drive, and so on.
  • Rule: A formal first-order logic formula using the format IF compound predicate THEN sequence of action, for example, IF gold price < $1140 THEN sell stock of oil and gas producing companies.
  • Classifier: A rule in the context of an LCS.
  • Rule fitness or score: This is identical to the definition of the fitness or score in the genetic algorithm. In the context of an LCS, it is the probability of a rule to be invoked and fired in response of change in environment.
  • Sensors: Environment variables monitored by agent, for example, temperature and hard drive status.
  • Input data stream: Flow of data generated by sensors. It is usually associated with online training.
  • Rule matching: Mechanism to match a predicate or compound predicate with a sensor.
  • Covering: The process of creating new rules to match a new condition (sensor) in the environment. It generates the rules by either using a random generator or mutating existing rules.
  • Predictor: An algorithm to find the action with the maximum number of occurrences within a set of matching rules.

Extended learning classifier systems (XCS)

Similar to reinforcement learning, the XCS algorithm has an exploration phase and an exploitation phase. The exploitation process consists of leveraging the existing rules to influence the target environment in a profitable or rewarding manner.

Extended learning classifier systems (XCS)

Exploitation component of the XCS algorithm

The following list describes each numbered block:

  • 1: Sensors acquire new data or events from the system.
  • 2: Rules for which the condition matches the input event are searched and extracted from the current population.
  • 3: A new rule is created if no match is found in the existing population. This process is known as covering.
  • 4: The chosen rules are ranked by their fitness values, and the rules with the highest predicted outcome are used to trigger the action.

The purpose of exploration components is to increase the rule base as a population of the chromosomes that encode these rules.

Extended learning classifier systems (XCS)

Exploration components of the XCS algorithm

The following list describes each numbered block of the block diagram:

  • 5: Once the action is performed, the system rewards the rules for which the action has been executed. The reinforcement learning module assigns credit to these rules.
  • 6: Rewards are used to update the rule fitness, applying evolutionary constraints to the existing population.
  • 7: The genetic algorithm updates the existing population of classifiers/rules using operators such as crossover and mutation.

XCS components

This section describes the key classes of the XCS. The implementation leverages the existing design of the genetic algorithm and the reinforcement learning. It is easier to understand the inner workings of the XCS algorithm with a concrete application.

Application to portfolio management

Portfolio management and trading have benefited from the application of extended learning classifiers [11:13]. The use case is the management of a portfolio of exchange-traded funds (ETFs) in an ever-changing financial environment. Contrary to stocks, exchange traded funds are representative of an industry-specific group of stocks or the financial market at large. Therefore, the price of these ETFs is affected by the following macroeconomic changes:

  • Gross domestic product
  • Inflation
  • Geopolitical events
  • Interest rates

Let's select the value of the 10-year Treasury yield as a proxy for the macroeconomic conditions, for the sake of simplicity.

The portfolio has to be constantly adjusted in response to any specific change in the environment or market condition that affects the total value of the portfolio, and can be done referring to the following table:

XCS component

Portfolio management

Environment

Portfolio of securities defined by its composition, total value, and the yield of the 10-year Treasury bond

Action

Change in the composition of the portfolio

Reward

Profit and loss of the total value of the portfolio

Input data stream

Feed of stock and bond price quotation

Sensor

Trading information regarding securities in the portfolio such as price, volume, volatility, or yield, and the yield on the-10 year Treasury bond

Predicate

Change in composition of the portfolio

Action

Rebalancing a portfolio by buying and selling securities

Rule

Association of trading data with the rebalancing of a portfolio

The first step is to create an initial set of rules regarding the portfolio. This initial set can be created randomly, much like the initial population of a genetic algorithm, or be defined by a domain expert.

Tip

The XCS initial population

Rules or classifiers are defined and/or refined through evolution. Therefore, there is no absolute requirement for the domain expert to set up a comprehensive knowledge base. In fact, rules can be randomly generated at the start of the training phase. However, seeding the XCS initial population with a few relevant rules improves the odds of having the algorithm converge quickly.

The reader is invited to initialize the population of rules with as many relevant and financially sound trading rules as possible. Over time, the execution of the XCS algorithm will confirm whether or not the initial rules are indeed appropriate. The following diagram describes the application of the XCS algorithm to the composition of a portfolio of ETFs, such as VWO, TLT, IWC, and so on, with the following components:

  • The population of trading rules
  • An algorithm to match rules and compute the prediction
  • An algorithm to extract the actions sets
  • The Q-learning module to assign credit or reward to the selected rules
  • The genetic algorithm to evolve the population of rules
    Application to portfolio management

    Overview of XCS algorithm to optimize portfolio allocation

The agent responds to the change in the allocation of ETFs in the portfolio by matching one of the existing rules.

Let's build the XCS agent from the ground.

XCS core data

There are three types of data that are manipulated by the XCS agent:

  • Signal: This is the trading signal
  • XcsAction: This is the action on the environment
  • XcsSensor: This is the sensor or data from the environment

The XcsAction class was introduced for the evaluation of the genetic algorithm in the Trading signals section in Chapter 10, Genetic Algorithms. The agent creates, modifies, and deletes actions. It makes sense to define these actions as mutable genes, as follows:

class XcsAction(val sensorid: String, val target: Double)(implicit val discr: Discretization) extends Gene(sensorid, target, EQUAL)

The XcsAction class has the identifier of the sensor, sensorId, and the target value as parameters. For example, the action to increase the number of shares of ETF, VWO in the portfolio to 80 is defines as follows:

Val vwoTo80 = new XcsAction("VWO", 80.0)

The only type of action allowed in this scheme is setting a value using the EQUAL operator. You can create actions that support other operators, such as += used to increase an existing value. These operators need to implement the operator trait, explained in the Trading operators section in Chapter 10, Genetic Algorithms.

A discretization instance has to be implicitly defined in order to encode the target value.

Finally, the XcsSensor class encapsulates the sensorId identifier for the variable and value of the sensor, as shown here:

case class XcsSensor(val sensorId: String, val value: Double)
val new10ytb = new XcsSensor("10yTBYield", 2.76)

Tip

Setters and getters

In this simplistic scenario, the sensors retrieve a new value from an environment variable. The action sets a new value to an environment variable. You can think of a sensor as a get method of an environment class and an action as a set method with variable/sensor ID and value as arguments.

XCS rules

The next step consists of defining a rule as a pair of two genes: a signal and an action, as shown in the following code:

class XcsRule(val signal: Signal, val action: XcsAction)

The rule: r1: IF(yield 10-year TB > 2.84%) THEN reduce VWO shares to 240 is implemented as follows:

val signal = new Signal("10ytb", 2.84, GREATER_THAN)
val action = new XcsAction("vwo", 240)
val r1 = new XcsRule(signal, action)

The agent encodes the rule as a chromosome using 2 bits to represent the operator and 32 bits for values, as shown in the following diagram:

XCS rules

In this implementation, there is no need to encode the type of action as the agent uses only one type of action—set. A complex action requires encoding of its type.

Tip

Knowledge encoding

This example uses very simple rules with a single predicate as the condition. Real-world domain knowledge is usually encoded using complex rules with multiple clauses. It is highly recommended that you break down complex rules into multiple basic rules of classifiers.

Matching a rule to a new sensor consists of matching the sensor to the signal. The algorithm matches the new new10ytb sensor against the signal in the current population of s10ytb1 and s10ytb2 rules that uses the same sensor or variable 10ytb, as follows:

val new10ytb = new XcsSensor("10ytb", 2.76)

val s10ytb1 = Signal("10ytb", 2.5, GREATER_THAN)
val s10ytb2 = Signal("10ytb", 2.2, LESS_THAN)

val r23: XcsRule(s10ytb1, act12)
val r34: XcsRule(s10ytb2, act17)
…

In this case, the agent selects the rule r23 but not r34 in the existing population. The agent then adds the act12 action to the list of possible actions. The agent lists all the rules that match the sensor: r23, r11, and r46, as shown in the following code:

val r23: XcsRule(s10yTB1, act12)
val r11: XcsRule(s10yTB6, act6)
val r46: XcsRule(s10yTB7, act12)

The action with the most references, act12, is executed. The Q-learning algorithm computes the reward from the profit or loss incurred by the portfolio following the execution of the selected rules r23 and r46. The agent uses the reward to adjust the fitness of r23 and r46, before the genetic selection in the next reproduction cycle. These two rules will reach and stay in the top tier of the rules in the population, until either a new genetic rule modified through crossover and mutation or a rule created through covering, triggers a more rewarding action on the environment.

Covering

The purpose of the covering phase is to generate new rules if no rule matches the input or sensor. The cover method of an XcsCover singleton generates a new XcsRule instance given a sensor and an existing set of actions, as shown here:

def cover(sensor: XcsSensor, actions: List[XcsAction]) (implicit discr: Discretization): List[XcsRule] = {
  actions.foldLeft(List[XcsRule]()) ((xs, act) => {
    val rIdx = Random.nextInt(Signal.numOperators)
    val signal = new Signal(sensor.id, sensor.value, new SOperator(rIdx))
    new XcsRule(signal, XcsAction(act, Random)) :: xs
  })
}

You might wonder why the cover method uses a set of actions as arguments knowing that covering consists of creating new actions. The method mutates (operator ^) an existing action to create a new one instead of using a random generator. This is one of the advantages of defining an action as a gene. The mutation is executed by one of the constructors of XcsAction, as follows:

def apply(action: XcsAction, r: Random): XcsAction = 
    (action ^ r.nextInt(XCSACTION_SIZE))

The index of the operator type, rIdx, is a random value in the interval [0, 3] because a signal uses four types of operators: None, >, <, and =.

Example of implementation

The Xcs class has the following purposes:

  • gaSolver: This is the selection and generation of genetically modified rules
  • qlLearner: This is the rewarding and scoring the rules
  • Xcs: These are the rules for matching, covering, and generation of action
    class Xcs(config: XcsConfig, population: Population[Signal], score: Chromosome[Signal]=> Unit, input: Array[QLInput]) extends PipeOperator[XcsSensor, List[XcsAction]] {
    
       val gaSolver = GASolver[Signal](config.gaConfig, score)   
       val featuresSet: Set[Chromosome[Signal]]  = population.chromosomes.toSet
       val qLearner = QLearning[Chromosome[Signal]](config.qlConfig, computeNumStates(input), extractGoals(input), input, featuresSet)
       …
    }

The XCS algorithm is initialized with a configuration, config, an initial set of rules, population, a fitness function, score, and an input to the Q-learning policy generate reward matrix for qlLearner. The goals and number of states are extracted from the input to the policy of the Q-learning algorithm.

In this implementation, the generic algorithm, gaSolver, is mutable. It is instantiated along with the Xcs container class. The Q-learning algorithm uses the same design, as any classifier, as immutable. The model of Q-learning is the best possible policy to reward rules. Any changes in the number of states or the rewarding scheme require a new instance of the learner.

Benefits and limitation of learning classifier systems

Learning classifier systems and XCS in particular, hold many promises, which are as follows:

  • They allow non-scientists and domain experts to describe the knowledge using familiar Boolean constructs and inferences such as predicates and rules
  • They provide analysts with an overview of the knowledge base and its coverage by distinguishing between the need for exploration and exploitation of the knowledge base

However, the scientific community has been slow to recognize the merits of these techniques. The wider adoption of learning classifier systems is hindered by the following factors:

  • Sheer complexity of the configuration of the algorithm because of the large number of parameters for exploration and exploitation.
  • Lack of a unified theory to validate the concept of evolutionary policies or rules. After all, these algorithms are the merger of standalone techniques. The accuracy and performance of the execution of LCSes depend on each component as well as the interaction between components.
  • An execution that is not always predictable in terms of scalability and performance.
  • Too many variants of LCS.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset