J. Holland introduced the concept of learning classifier systems (LCS) more than 30 years ago as an extension to evolutionary computing [11:10]:
Learning classifier systems are a kind of rule-based system with general mechanisms for processing rules in parallel, for adaptive generation of new rules, and for testing the effectiveness of new rules.
However, the concept started to get the attention of computer scientists only a few years ago, with the introduction of several variants of the original concept, including extended learning classifier systems (XCS). Learning classifier systems are interesting because they combine rules, reinforcement learning, and genetic algorithms.
Disclaimer
The implementation of the extended learning classifier is presented for informational purposes only. Validating XCS against a known and labeled population of rules is a very significant endeavor. The source code snippet is presented only to illustrate the different components of the XCS algorithm.
Learning classifier systems merge the concepts of reinforcement learning, rule-based policies, and evolutionary computing. This unique class of learning algorithms represents the merger of the following research fields [11:11]:
Learning classifier systems are an example of complex adaptive systems. A learning classifier system has the following four components:
Learning classifier systems are particularly appropriate to problems in which the environment is constantly changing, and are the combination of learning strategy and an evolutionary approach to build and maintain a knowledge base [11:12].
Supervised learning methods alone can be effective on large datasets, but they require either a significant amount of labeled data or a reduced set of features to avoid overfitting. Such constraints may not be practical in the case of ever-changing environments.
The last 20 years have seen the introduction of many variants of learning classifier systems that belong to the following two categories:
The rest of this section is dedicated to the second type of learning classifiers—more specifically extended learning classifier systems. In a context of LCS, the term classifier refers to the predicate or rule generated by the system. From this point on, the term "rule" replaces the term classifier to avoid confusion with the more common definition of classification.
Each domain of research has its own terminology and LCS is no exception. The terminology of LCS consists of the following terms:
Similar to reinforcement learning, the XCS algorithm has an exploration phase and an exploitation phase. The exploitation process consists of leveraging the existing rules to influence the target environment in a profitable or rewarding manner.
The following list describes each numbered block:
The purpose of exploration components is to increase the rule base as a population of the chromosomes that encode these rules.
The following list describes each numbered block of the block diagram:
This section describes the key classes of the XCS. The implementation leverages the existing design of the genetic algorithm and the reinforcement learning. It is easier to understand the inner workings of the XCS algorithm with a concrete application.
Portfolio management and trading have benefited from the application of extended learning classifiers [11:13]. The use case is the management of a portfolio of exchange-traded funds (ETFs) in an ever-changing financial environment. Contrary to stocks, exchange traded funds are representative of an industry-specific group of stocks or the financial market at large. Therefore, the price of these ETFs is affected by the following macroeconomic changes:
Let's select the value of the 10-year Treasury yield as a proxy for the macroeconomic conditions, for the sake of simplicity.
The portfolio has to be constantly adjusted in response to any specific change in the environment or market condition that affects the total value of the portfolio, and can be done referring to the following table:
The first step is to create an initial set of rules regarding the portfolio. This initial set can be created randomly, much like the initial population of a genetic algorithm, or be defined by a domain expert.
The XCS initial population
Rules or classifiers are defined and/or refined through evolution. Therefore, there is no absolute requirement for the domain expert to set up a comprehensive knowledge base. In fact, rules can be randomly generated at the start of the training phase. However, seeding the XCS initial population with a few relevant rules improves the odds of having the algorithm converge quickly.
The reader is invited to initialize the population of rules with as many relevant and financially sound trading rules as possible. Over time, the execution of the XCS algorithm will confirm whether or not the initial rules are indeed appropriate. The following diagram describes the application of the XCS algorithm to the composition of a portfolio of ETFs, such as VWO, TLT, IWC, and so on, with the following components:
The agent responds to the change in the allocation of ETFs in the portfolio by matching one of the existing rules.
Let's build the XCS agent from the ground.
There are three types of data that are manipulated by the XCS agent:
The XcsAction
class was introduced for the evaluation of the genetic algorithm in the Trading signals section in Chapter 10, Genetic Algorithms. The agent creates, modifies, and deletes actions. It makes sense to define these actions as mutable genes, as follows:
class XcsAction(val sensorid: String, val target: Double)(implicit val discr: Discretization) extends Gene(sensorid, target, EQUAL)
The XcsAction
class has the identifier of the sensor, sensorId
, and the target value as parameters. For example, the action to increase the number of shares of ETF, VWO in the portfolio to 80 is defines as follows:
Val vwoTo80 = new XcsAction("VWO", 80.0)
The only type of action allowed in this scheme is setting a value using the EQUAL
operator. You can create actions that support other operators, such as +=
used to increase an existing value. These operators need to implement the operator trait, explained in the Trading operators section in Chapter 10, Genetic Algorithms.
A discretization instance has to be implicitly defined in order to encode the target value.
Finally, the XcsSensor
class encapsulates the sensorId
identifier for the variable and value
of the sensor, as shown here:
case class XcsSensor(val sensorId: String, val value: Double) val new10ytb = new XcsSensor("10yTBYield", 2.76)
Setters and getters
In this simplistic scenario, the sensors retrieve a new value from an environment variable. The action sets a new value to an environment variable. You can think of a sensor as a get method of an environment class and an action as a set method with variable/sensor ID and value as arguments.
The next step consists of defining a rule as a pair of two genes: a signal and an action, as shown in the following code:
class XcsRule(val signal: Signal, val action: XcsAction)
The rule: r1: IF(yield 10-year TB > 2.84%) THEN reduce VWO shares to 240 is implemented as follows:
val signal = new Signal("10ytb", 2.84, GREATER_THAN)
val action = new XcsAction("vwo", 240)
val r1 = new XcsRule(signal, action)
The agent encodes the rule as a chromosome using 2 bits to represent the operator and 32 bits for values, as shown in the following diagram:
In this implementation, there is no need to encode the type of action as the agent uses only one type of action—set. A complex action requires encoding of its type.
Matching a rule to a new sensor consists of matching the sensor to the signal. The algorithm matches the new new10ytb
sensor against the signal in the current population of s10ytb1
and s10ytb2
rules that uses the same sensor or variable 10ytb
, as follows:
val new10ytb = new XcsSensor("10ytb", 2.76) val s10ytb1 = Signal("10ytb", 2.5, GREATER_THAN) val s10ytb2 = Signal("10ytb", 2.2, LESS_THAN) val r23: XcsRule(s10ytb1, act12) val r34: XcsRule(s10ytb2, act17) …
In this case, the agent selects the rule r23
but not r34
in the existing population. The agent then adds the act12
action to the list of possible actions. The agent lists all the rules that match the sensor: r23
, r11
, and r46
, as shown in the following code:
val r23: XcsRule(s10yTB1, act12) val r11: XcsRule(s10yTB6, act6) val r46: XcsRule(s10yTB7, act12)
The action with the most references, act12
, is executed. The Q-learning algorithm computes the reward from the profit or loss incurred by the portfolio following the execution of the selected rules r23
and r46
. The agent uses the reward to adjust the fitness of r23
and r46
, before the genetic selection in the next reproduction cycle. These two rules will reach and stay in the top tier of the rules in the population, until either a new genetic rule modified through crossover and mutation or a rule created through covering, triggers a more rewarding action on the environment.
The purpose of the covering phase is to generate new rules if no rule matches the input or sensor. The cover
method of an XcsCover
singleton generates a new XcsRule
instance given a sensor and an existing set of actions, as shown here:
def cover(sensor: XcsSensor, actions: List[XcsAction]) (implicit discr: Discretization): List[XcsRule] = { actions.foldLeft(List[XcsRule]()) ((xs, act) => { val rIdx = Random.nextInt(Signal.numOperators) val signal = new Signal(sensor.id, sensor.value, new SOperator(rIdx)) new XcsRule(signal, XcsAction(act, Random)) :: xs }) }
You might wonder why the cover
method uses a set of actions as arguments knowing that covering consists of creating new actions. The method mutates (operator ^
) an existing action to create a new one instead of using a random generator. This is one of the advantages of defining an action as a gene. The mutation is executed by one of the constructors of XcsAction
, as follows:
def apply(action: XcsAction, r: Random): XcsAction = (action ^ r.nextInt(XCSACTION_SIZE))
The index of the operator type, rIdx
, is a random value in the interval [0, 3] because a signal uses four types of operators: None
, >
, <
, and =
.
The Xcs
class has the following purposes:
gaSolver
: This is the selection and generation of genetically modified rulesqlLearner
: This is the rewarding and scoring the rulesXcs
: These are the rules for matching, covering, and generation of actionclass Xcs(config: XcsConfig, population: Population[Signal], score: Chromosome[Signal]=> Unit, input: Array[QLInput]) extends PipeOperator[XcsSensor, List[XcsAction]] { val gaSolver = GASolver[Signal](config.gaConfig, score) val featuresSet: Set[Chromosome[Signal]] = population.chromosomes.toSet val qLearner = QLearning[Chromosome[Signal]](config.qlConfig, computeNumStates(input), extractGoals(input), input, featuresSet) … }
The XCS algorithm is initialized with a configuration, config
, an initial set of rules, population
, a fitness function, score
, and an input to the Q-learning policy generate reward matrix for qlLearner
. The goals and number of states are extracted from the input to the policy of the Q-learning algorithm.
In this implementation, the generic algorithm, gaSolver
, is mutable. It is instantiated along with the Xcs
container class. The Q-learning algorithm uses the same design, as any classifier, as immutable. The model of Q-learning is the best possible policy to reward rules. Any changes in the number of states or the rewarding scheme require a new instance of the learner.
Learning classifier systems and XCS in particular, hold many promises, which are as follows:
However, the scientific community has been slow to recognize the merits of these techniques. The wider adoption of learning classifier systems is hindered by the following factors: