J. Holland introduced the concept of learning classifier systems (LCS) more than 30 years ago as an extension to evolutionary computing [11:10].
Learning classifier systems are a kind of rule-based system with general mechanisms for processing rules in parallel, for adaptive generation of new rules, and for testing the effectiveness of new rules.
However, the concept started to get the attention of computer scientists only a few years ago, with the introduction of several variants of the original concept, including extended learning classifiers (XCS). Learning classifier systems are interesting because they combine rules, reinforcement learning, and genetic algorithms.
Disclaimer
The implementation of the extended learning classifier is presented for informational purposes only. Validating XCS against a known and labeled population of rules is a very significant endeavor. The source code snippet is presented only to illustrate the different components of the XCS algorithm.
Learning classifier systems merge the concepts of reinforcement learning, rule-based policies, and evolutionary computing. This unique class of learning algorithms represents the merger of the following research fields [11:11]:
Let's take a look at the following diagram:
Learning classifier systems are an example of complex adaptive systems. A learning classifier system has the following four components:
Learning classifier systems are particularly appropriate to problems in which the environment is constantly changing and are the combinations of a learning strategy and an evolutionary approach to build and maintain a knowledge base [11:12].
Supervised learning methods alone can be effective on large datasets, but they require either a significant amount of labeled data or a reduced set of features to avoid overfitting. Such constraints may not be practical in the case of ever-changing environments.
The last 20 years have seen the introduction of many variants of learning classifier systems that belong to the following two categories:
The rest of this section is dedicated to the second type of learning classifiers—more specifically, extended learning classifier systems. In a context of LCS, the term classifier refers to the predicate or rule generated by the system. From this point on, the term rule replaces the term classifier to avoid confusion with the more common definition of classification.
Each domain of research has its own terminology and LCS is no exception. The terminology of LCS consists of the following terms:
Similar to reinforcement learning, the XCS algorithm has an exploration phase and an exploitation phase. The exploitation process consists of leveraging the existing rules to influence the target environment in a profitable or rewarding manner:
The following list describes each numbered block:
The purpose of exploration components is to increase the rule base as a population of the chromosomes that encode these rules.
The following list describes each numbered block of the block diagram:
This section describes the key classes of the XCS. The implementation leverages the existing design of the genetic algorithm and the reinforcement learning. It is easier to understand the inner workings of the XCS algorithm with a concrete application.
Portfolio management and trading have benefited from the application of extended learning classifiers [11:13]. The use case is the management of a portfolio of exchange-traded funds in an ever-changing financial environment. Contrary to stocks, exchange-traded funds are representative of an industry-specific group of stocks or the financial market at large. Therefore, the price of these ETFs is affected by the following macroeconomic changes:
Let's select the value of the 10-year Treasury yield as a proxy for the macroeconomic conditions, for the sake of simplicity.
The portfolio has to be constantly adjusted in response to any specific change in the environment or market condition that affects the total value of the portfolio, and this can be done by referring to the following table:
XCS component |
Portfolio management |
---|---|
Environment |
This is the portfolio of securities defined by its composition, total value, and the yield of the 10-year Treasury bond |
Action |
This is the change in the composition of the portfolio |
Reward |
This is the profit and loss of the total value of the portfolio |
Input data stream |
This is the feed of the stock and bond price quotation |
Sensor |
This is the trading information regarding securities in the portfolio such as price, volume, volatility, yield, and the yield of the-10 year Treasury bond |
Predicate |
This is the change in the composition of the portfolio |
Action |
This rebalances a portfolio by buying and selling securities |
Rule |
This is the association of trading data with the rebalancing of a portfolio |
The first step is to create an initial set of rules regarding the portfolio. This initial set can be created randomly, like the initial population of a genetic algorithm or defined by a domain expert.
The XCS initial population
Rules or classifiers are defined and/or refined through evolution. Therefore, there is no absolute requirement for the domain expert to set up a comprehensive knowledge base. In fact, rules can be randomly generated at the start of the training phase. However, seeding the XCS initial population with a few relevant rules improves the odds of having the algorithm converge quickly.
You are invited to initialize the population of rules with as many relevant and financially sound trading rules as possible. Over time, the execution of the XCS algorithm will confirm whether or not the initial rules are indeed appropriate. The following diagram describes the application of the XCS algorithm to the composition of a portfolio of ETFs, such as VWO, TLT, IWC, and so on, with the following components:
Let's take a look at the following diagram:
The agent responds to the change in the allocation of ETFs in the portfolio by matching one of the existing rules.
Let's build the XCS agent from the ground.
There are three types of data that are manipulated by the XCS agent:
Signal
: This is the trading signal.XcsAction
: This is the action on the environment. It subclasses a Gene
defined in the genetic algorithm.XcsSensor
: This is the sensor or data from the environment.The Gene
class was introduced for the evaluation of the genetic algorithm in the Trading signals section in Chapter 10, Genetic Algorithms. The agent creates, modifies, and deletes actions. It makes sense to define these actions as mutable genes, as follows:
class XcsAction(sensorId: String, target: Double) (implicit quantize: Quantization, encoding: Encoding) //1 extends Gene(sensorId, target, EQUAL)
The quantization and encoding of the XCSAction
into a Gene
has to be explicitly declared (line 1
). The XcsAction
class has the identifier of the sensorId
sensor and the target value as parameters. For example, the action to increase the number of shares of ETF, VWO in the portfolio to 80 is defined as follows:
val vwoTo80 = new XcsAction("VWO", 80.0)
The only type of action allowed in this scheme is setting a value using the EQUAL
operator. You can create actions that support other operators such as +=
used to increase an existing value. These operators need to implement the operator trait, as explained in the Trading operators section in Chapter 10, Genetic Algorithms.
Finally, the XcsSensor
class encapsulates the sensorId
identifier for the variable and value
of the sensor, as shown here:
case class XcsSensor(val sensorId: String, val value: Double) val new10ytb = XcsSensor("10yTBYield", 2.76)
Setters and getters
In this simplistic scenario, the sensors retrieve a new value from an environment variable. The action sets a new value to an environment variable. You can think of a sensor as a get method of an environment class and an action as a set method with variable/sensor ID and value as arguments.
The next step consists of defining a rule of the XcsRule
type as a pair of two genes: a signal
and an action
, as shown in the following code:
class XcsRule(val signal: Signal, val action: XcsAction)
The rule: r1: IF(yield 10-year TB > 2.84%) THEN reduce VWO shares to 240 is implemented as follows:
val signal = new Signal("10ytb", 2.84, GREATER_THAN) val action = new XcsAction("vwo", 240) val r1 = new XcsRule(signal, action)
The agent encodes the rule as a chromosome using 2 bits to represent the operator and 32 bits for values, as shown in the following diagram:
In this implementation, there is no need to encode the type of action as the agent uses only one type of action—set. A complex action requires encoding of its type.
Matching a rule to a new sensor consists of matching the sensor to the signal. The algorithm matches the new new10ytb
sensor (line 2
) against the signal in the current population of s10ytb1
(line 3
) and s10ytb2
(line 4
) rules that use the same sensor or the 10ytb
variable as follows:
val new10ytb = new XcsSensor("10ytb", 2.76) //2 val s10ytb1 = Signal("10ytb", 2.5, GREATER_THAN) //3 val s10ytb2 = Signal("10ytb", 2.2, LESS_THAN) //4
In this case, the agent selects the r23
rule but not r34
in the existing population. The agent then adds the act12
action to the list of possible actions. The agent lists all the rules that match the r23
, r11
, and r46
sensors, as shown in the following code:
val r23: XcsRule(s10yTB1, act12) //5 val r11: XcsRule(s10yTB6, act6) val r46: XcsRule(s10yTB7, act12) //6
The action with the most references, act12
, (lines 5
and 6
) is executed. The Q-learning algorithm computes the reward from the profit or loss incurred by the portfolio following the execution of the selected r23
and r46
rules. The agent uses the reward to adjust the fitness of r23
and r46
, before the genetic selection in the next reproduction cycle. These two rules will reach and stay in the top tier of the rules in the population, until either a new genetic rule modified through crossover and mutation or a rule created through covering, triggers a more rewarding action on the environment.
The purpose of the covering phase is to generate new rules if no rule matches the input or sensor. The cover
method of an XcsCover
singleton generates a new XcsRule
instance given a sensor and an existing set of actions, as shown here:
val MAX_NUM_ACTIONS = 2048
def cover(sensor: XcsSensor, actions: List[XcsAction])
(implicit quant: Quantization, encoding: Encoding):
List[XcsRule] =
actions./:(List[XcsRule]()) ((xs, act) => {
val signal = Signal(sensor.id, sensor.value,
new SOperator(Random.nextInt(Signal.numOperators)))
new XcsRule(signal, XcsAction(act, Random)) :: xs
})
}
You might wonder why the cover
method uses a set of actions as arguments knowing that covering consists of creating new actions. The method mutates (^
operator) an existing action to create a new one instead of using a random generator. This is one of the advantages of defining an action as a gene. One of the constructors of XcsAction
executes the mutation, as follows:
def apply(action: XcsAction, r: Random): XcsAction = (action ^ r.nextInt(XCSACTION_SIZE))
The index of the operator r
type is a random value in the interval [0, 3] because a signal uses four types of operators: None
, >
, <
, and =
.
The Xcs
class has the following purposes:
gaSolver
: This is the selection and generation of genetically modified rulesqlLearner
: This is the rewarding and scoring the rulesXcs
: These are the rules for matching, covering, and generation of actionsThe extended learning classifier is a data transformation of the ETransform
type with an explicit configuration of the XcsConfig
type (line 8
) (refer to the Monadic data transformation section in Chapter 2, Hello World!):
class Xcs(config: XcsConfig, population: Population[Signal], score: Chromosome[Signal]=> Unit, input: Array[QLInput]) //7 extends ETransform[XcsConfig](config) { //8 type U = XcsSensor //9 type V = List[XcsAction] //10 val solver = GASolver[Signal](config.gaConfig,score) val features = population.chromosomes.toSeq val qLearner = QLearning[Chromosome[Signal]]( //11 config.qlConfig, extractGoals(input), input, features) override def |> : PartialFunction[U, Try[V]] ... }
The XCS algorithm is initialized with a configuration config
, an initial set of rules population
, a fitness function score
, and an input
to the Q-learning policy generate reward matrix for qlLearner
(line 7
). Being an explicit data transformation, the U
type of an input element and the V
type of the output element to the |>
predictor are initialized as XcsSensor
(line 9
) and List[XcsAction]
(line 10
).
The goals and number of states are extracted from the input to the policy of the Q-learning algorithm.
In this implementation, the solver
generic algorithm is mutable. It is instantiated along with the Xcs
container class. The Q-learning algorithm uses the same design, as any classifier, as immutable. The model of Q-learning is the best possible policy to reward rules. Any changes in the number of states or the rewarding scheme require a new instance of the learner.
Learning classifier systems and XCS in particular, hold many promises, which are listed as follows:
However, the scientific community has been slow to recognize the merits of these techniques. The wider adoption of learning classifier systems is hindered by the following factors: