Below are some of the hyperparameters defined that we will be using throughout the code and are totally configurable.
# Discount in Bellman Equation
gamma = 0.95
# Epsilon
epsilon = 1.0
# Minimum Epsilon
epsilon_min = 0.01
# Decay multiplier for epsilon
epsilon_decay = 0.99
# Size of deque container
deque_len = 20000
# Average score needed over 100 epochs
target_score = 200
# Number of games
episodes = 2000
# Data points per episode used to train the agent
batch_size = 64
# Optimizer for training the agent
optimizer = 'adam'
# Loss for training the agent
loss = 'mse'
- gamma - Discount parameter in the bellman equation
- epsilon_decay - Multiplier by which you want to discount the value of 'epsilon' after each episode/game
- epsilon_min - Minimum value of 'epsilon' beyond which you do not want to decay it
- deque_len - Size of the deque container used to store the training examples (state, reward, done, and action)
- target_score - Average score over 100 epochs you want the agent to score after which you stop the learning process
- episodes - Maximum number of games you want the agent to play
- batch_size - Size of the batch of training data (stored in the deque container) used to train the agent after each episode
- optimizer - Optimizer of choice for training the agent
- loss - Loss of choice for training the agent
Experiment with different learning rates, optimizers, batch size as well as epsilon_decay value to see how these factors affect the quality of your model and if you get better results, show it to the deep learning community.