A simple Q-learning implementation

Q-learning is an algorithm that can be used in financial and market trading applications, such as options trading. One reason is that the best policy is generated through training. that is, RL defines the model in Q-learning over time and is constantly updated with any new episode. Q-learning is a method for optimizing (cumulated) discounted reward, making far-future rewards less prioritized than near-term rewards; Q-learning is a form of model-free RL. It can also be viewed as a method of asynchronous dynamic programming (DP).

It provides agents with the capability of learning to act optimally in Markovian domains by experiencing the consequences of actions, without requiring them to build maps of the domains. In short, Q-learning qualifies as an RL technique because it does not strictly require labeled data and training. Moreover, the Q-value does not have to be a continuous, differentiable function.

On the other hand, Markov decision processes provide a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker. Therein, the probability of the random variables at a future point of time depends only on the information at the current point in time and not on any of the historical values. In other words, the probability is independent of historical states.

Table of Contents for A simple Q-learning implementation

Create new playlist

Sign In

Sign Up

Table of Contents for
A simple Q-learning implementation