Components of the Q-learning algorithm

This implementation is highly inspired by the Q-learning implementation from a book, written by Patrick R. Nicolas, Scala for Machine Learning - Second Edition, Packt Publishing Ltd., September 2017. Thanks to the author and Packt Publishing Ltd. The source code is available at https://github.com/PacktPublishing/Scala-for-Machine-Learning-Second-Edition/tree/master/src/main/scala/org/scalaml/reinforcement.

Interested readers can take a look at the the original implementation at the extensed version of course can be downloaded from Packt repository or GitHub repo of this book. The key components of implementation of the Q-learning algorithm are a few classes—QLearning, QLSpace, QLConfig, QLAction, QLState, QLIndexedState, and QLModelas described in the following points:

  • QLearning: Implements training and prediction methods. It defines a data transformation of type ETransform using a configuration of type QLConfig.
  • QLConfig: This parameterized class defines the configuration parameters for the Q-learning. To be more specific, it is used to hold an explicit configuration from the user.
  • QLAction: This is a class that defines actions between on source state and multiple destination states.
  • QLPolicy: This is an enumerator used to define the type of parameters used to update the policy during the training of the Q-learning model.
  • QLSpace: This has two components: a sequence of states of type QLState and the identifier, id, of one or more goal states within the sequence.
  • QLState: Contains a sequence of QLAction instances that help in the transition from one state to another. It is also used as a reference for the object or instance for which the state is to be evaluated and predicted.
  • QLIndexedState: This class returns an indexed state that indexes a state in the search toward the goal state.
  • QLModel: This is used to generate a model through the training process. Eventually, it contains the best policy and the accuracy of a model.

Note that, apart from the preceding components, an optional constraint function limits the scope of the search for the next most rewarding action from the current state. The following diagram shows the key components of the Q-learning algorithm and their interaction:

Figure 5: Components of the QLearning algorithm and their interaction
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset