10. Playing an Atari Game with Deep Recurrent Q-Networks

Introduction

In this chapter, we will be introduced to Deep Recurrent Q Networks (DRQNs) and their variants. You will train Deep Q Network (DQN) models with Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). You will acquire hands-on experience of using the OpenAI Gym package to train reinforcement learning agents to play an Atari game. You will also learn how to analyze long sequences of input and output data using attention mechanisms. By the end of this chapter, you will have a good understanding of what DRQNs are and how to implement them with TensorFlow.

Introduction

In the previous chapter, we learned that DQNs achieved higher performance compared to traditional reinforcement learning techniques. Video games are a perfect example of where DQN models excel. Training an agent to play video games can be quite difficult for traditional reinforcement learning agents as there is a huge number of possible combinations of states, actions, and Q-values to be processed and analyzed during the training.

Deep learning algorithms are renowned for handling high-dimensional tensors. Some researchers combined Q-learning techniques with deep learning models to overcome this limitation and came up with DQNs. A DQN model comprises a deep learning model that is used as a function approximation of Q-values. This technique constituted a major breakthrough in the reinforcement learning field as it helped to handle much larger state and action spaces than traditional models.

Since then, further research has been undertaken and different types of DQN models have been designed, such as DRQNs or Deep Attention Recurrent Q Networks (DARQNs). In this chapter, we will see how DQN models can benefit from CNN and RNN models, which have achieved amazing results in computer vision and natural language processing. We will look at how to train such models to play the famous Atari game Breakout in the next section.

Understanding the Breakout Environment

We will be training different deep reinforcement learning agents to play the game Breakout in this chapter. Before diving in, let's learn some more about the game.

Breakout is an arcade game designed and released in 1976 by Atari. Steve Wozniak, co-founder of Apple, was part of the design and development team. The game was extremely popular at that time and multiple versions were developed over the years.

The goal of the game is to break all the bricks located at the top of the screen with a ball (since the game was developed in 1974 with low screen definition, the ball is represented by pixels and so its shape can be seen as a rectangle in the following screenshot) without dropping it. The player can move a paddle horizontally at the bottom of the screen to hit the ball before it drops and bounce it back toward the bricks. Also, the ball will bounce back after hitting the side walls or the ceiling. The game ends when either the ball drops (in this case, the player loses) or when all the bricks have been broken and the player wins and can proceed to the next stage:

Figure 10.1: Screenshot of Breakout

Figure 10.1: Screenshot of Breakout

The gym package from OpenAI provides an environment that emulates this game and allows deep reinforcement learning agents to train and play on it. The name of the environment that we will be using is BreakoutDeterministic-v4. Given below are some basic code implementations of this environment.

You will need to load the Breakout environment from the gym package before being able to train an agent to play this game. To do so, we will use the following code snippet:

import gym

env = gym.make('BreakoutDeterministic-v4')

This is a deterministic game where the actions chosen by the agent will happen every time as intended and with a frame skipping rate of 4. Frame skipping corresponds to the number of frames an action is repeated until a new action is performed.

The game comprises four deterministic actions, as shown by the following code:

env.action_space

The following is the output of the code:

Discrete(4)

The observation space is a color image (a box of 3 channels) of size 210 by 160:

env.observation_space

The following is the output of the code:

Box(210, 160, 3)

To initialize the game and get the first initial state, we need to call the .reset() method, as shown in the following code:

state = env.reset()

To sample an action (that is, taking a random action from all the possible actions) from the action space, we can use the .sample() method:

action = env.action_space.sample()

Finally, to perform a single action and get its results from the environment, we need to call the .step() method:

new_state, reward, is_done, info = env.step(action)

The following screenshot is a new_state result of the environment state after performing an action:

Figure 10.2: Result of the new state after performing an action

Figure 10.2: Result of the new state after performing an action

The .step() method returns four different objects:

  • The new environment state resulting from the previous action.
  • The reward related to the previous action.
  • A flag indicating whether the game has ended after the previous action (either a win or the game is over).
  • Some additional information from the environment. This information cannot be used to train the agent, as stated in the OpenAI instructions.

Having gone through some basic code implementation of Breakout in OpenAI, let's perform our first exercise where we will have our agent play this game.

Exercise 10.01: Playing Breakout with a Random Agent

In this exercise, we will be implementing some functions for playing the game Breakout that will be useful for the remainder of the chapter. We will also create an agent that takes random actions:

  1. Open a new Jupyter Notebook file and import the gym library:

    import gym

  2. Create a class called RandomAgent that takes a single input parameter named env, the game environment. This class will have a method called get_action() that will return a random action from the environment:

    class RandomAgent():

        def __init__(self, env):

            self.env = env

        def get_action(self, state):

            return self.env.action_space.sample()

  3. Create a function called initialize_env() that will return the initial state of the given input environment, a False value that corresponds to the initial value of a done flag, and 0 as the initial value of the reward:

    def initialize_env(env):

        initial_state = env.reset()

        initial_done_flag = False

        initial_rewards = 0

        return initial_state, initial_done_flag, initial_rewards

  4. Create a function called play_game() that takes an agent, a state, a done flag, and a list of rewards as inputs. This will return the total reward received. This play_game() function will iterate until the done flag equals True. At each iteration, it will perform the following actions: get an action from the agent, perform the action on the environment, accumulate the reward received, and prepare for the next state:

    def play_game(agent, state, done, rewards):

        while not done:

            action = agent.get_action(state)

            next_state, reward, done, _ = env.step(action)

            state = next_state

            rewards += reward

        return rewards

  5. Create a function called train_agent() that takes as inputs an environment, a number of episodes, and an agent. This function will create a deque object from the collections package and iterate through the number of episodes provided. At each iteration, it will perform the following actions: initialize the environment with initialize_env(), play a game with play_game(), and append the received rewards to the deque object. Finally, it will print the average score of the games played:

    def train_agent(env, episodes, agent):

        from collections import deque

        import numpy as np

        scores = deque(maxlen=100)

        for episode in range(episodes)

            state, done, rewards = initialize_env(env)

            rewards = play_game(agent, state, done, rewards)

            scores.append(rewards)

        print(f"Average Score: {np.mean(scores)}")

  6. Instantiate a Breakout environment called env using the gym.make() function:

    env = gym.make('BreakoutDeterministic-v4')

  7. Instantiate a RandomAgent object called agent:

    agent = RandomAgent(env)

  8. Create a variable called episodes that will take the value 10:

    episodes = 10

  9. Call the train_agent function by providing env, episodes, and the agent:

    train_agent(env, episodes, agent)

    After training the agent, you will expect to achieve something approaching the following score (your score may be slightly different due to the randomness of the game):

    Average Score: 0.6

The random agent is achieving a low score after 10 episodes, that is, 0.6. We will consider that the agent will have learned to play this game if it achieves a score above 10. However, since we have use a low number of episodes, we have not yet reached a stage where we achieve a score above 10. At this stage, however, we have created some functions for playing the game Breakout that we will reuse and update for the coming sections.

Note

To access the source code for this specific section, please refer to https://packt.live/30CfVeH.

You can also run this example online at https://packt.live/3hi12nU.

In the next section, we will look at CNN models and how to build them in TensorFlow.

CNNs in TensorFlow

CNNs are a type of deep learning architecture that achieved amazing results in computer vision tasks such as image classification, object detection, and image segmentation. Self-driving cars are an example of a real-life application of such technology.

The main element of CNNs is the convolutional operation, where a filter is applied to different parts of an image to detect specific patterns and generate a feature map. A feature map can be thought of as an image with the detected patterns highlighted, as shown in the following example:

Figure 10.3: Example of a vertical edge feature map

Figure 10.3: Example of a vertical edge feature map

A CNN is composed of several convolutional layers that apply the convolutional operation with different filters. The final layers of a CNN are usually one or several fully connected layers that are responsible for making the right predictions for a given dataset. For example, the final layer of a CNN trained to predict images of digits will be a fully connected layer of 10 neurons. Each neuron will be responsible for predicting the probability of occurrence of each digit (0 to 9):

Figure 10.4: Example of a CNN architecture for classifying images of digits

Figure 10.4: Example of a CNN architecture for classifying images of digits

Building CNN models is extremely easy with TensorFlow, thanks to the Keras API. To define a convolutional layer, we just need to use the Conv2D() class, as shown in the following code:

from tensorflow.keras.layers import Conv2D

Conv2D(128, kernel_size=(3, 3), activation="relu")

In the preceding example, we have created a convolutional layer with 128 filters (or kernels) of size 3 by 3, and relu as the activation function.

Note

Throughout the course of this chapter, we'll be using the ReLU activation function for CNN models, as it is one of the most performant activation functions.

To define a fully connected layer, we will use the Dense() class:

from tensorflow.keras.layers import Dense

Dense(units=10, activation='softmax')

In Keras, we can use the Sequential() class to create a multi-layer CNN:

import tensorflow as tf

from tensorflow.keras.layers import Conv2D, Dense

model = tf.keras.Sequential()

model.add(Conv2D(128, kernel_size=(3, 3), activation="relu"),

          input_shape=(100, 100, 3))

model.add(Conv2D(128, kernel_size=(3, 3), activation="relu"))

model.add(Dense(units=100, activation="relu"))

model.add(Dense(units=10, activation="softmax"))

Please note that you need to provide the dimensions of the input images for the first convolutional layer only. After defining the layers of your model, you will need to compile it by providing the loss function, the optimizer, and the metrics to be displayed:

model.compile(loss='sparse_categorical_crossentropy',

              optimizer="adam", metrics=['accuracy'])

Finally, the last step is to train the CNN with the training set on a specified number of epochs:

model.fit(features_train, label_train, epochs=5)

Another useful method from TensorFlow is tf.image.rgb_to_grayscale(), which is used to convert a color image to grayscale:

img = tf.image.rgb_to_grayscale(img)

To resize an input image, we will use the tf.image.resize() method:

img = tf.image.resize(img, [50, 50])

Now that we know how to build a CNN model, let's put this into practice in the following exercise.

Exercise 10.02: Designing a CNN Model with TensorFlow

In this exercise, we will be designing a CNN model with TensorFlow. This model will be used for our DQN agent in Activity 10.01, Training a DQN with CNNs to Play Breakout, where we will train this model to play the game Breakout. Perform the following steps to implement the exercise:

  1. Open a new Jupyter Notebook file and import the tensorflow package:

    import tensorflow as tf

  2. Import the Sequential class from tensorflow.keras.models:

    from tensorflow.keras.models import Sequential

  3. Instantiate a sequential model and save it to a variable called model:

    model = Sequential()

  4. Import the Conv2D class from tensorflow.keras.layers:

    from tensorflow.keras.layers import Conv2D

  5. Instantiate a convolutional layer with Conv2D with 32 filters of size 8, a stride of 4 by 4, relu as the activation function, and an input shape of (84, 84, 1). These dimensions are related to the size of the screen for the game Breakout. Save it to a variable called conv1:

    conv1 = Conv2D(32, 8, (4,4), activation='relu',

                   padding='valid', input_shape=(84, 84, 1))

  6. Instantiate a second convolutional layer with Conv2D with 64 filters of size 4, a stride of 2 by 2, and relu as the activation function. Save it to a variable called conv2:

    conv2 = Conv2D(64, 4, (2,2), activation='relu',

                   padding='valid')

  7. Instantiate a third convolutional layer with Conv2D with 64 filters of size 3, a stride of 1 by 1, and relu as the activation function. Save it to a variable called conv3:

    conv3 = Conv2D(64, 3, (1,1), activation='relu', padding='valid')

  8. Add the three convolutional layers to the model by means of the add() method:

    model.add(conv1)

    model.add(conv2)

    model.add(conv3)

  9. Import the Flatten class from tensorflow.keras.layers. This class will resize the output of the convolutional layers to a one-dimension vector:

    from tensorflow.keras.layers import Flatten

  10. Add an instantiated Flatten layer to the model by means of the add() method:

    model.add(Flatten())

  11. Import the Dense class from tensorflow.keras.layers:

    from tensorflow.keras.layers import Dense

  12. Instantiate a fully connected layer with 256 units and relu as the activation function:

    fc1 = Dense(256, activation='relu')

  13. Instantiate a fully connected layer with 4 units, which corresponds to the number of possible actions from the game Breakout:

    fc2 = Dense(4)

  14. Add the two fully connected layers to the model by means of the add() method:

    model.add(fc1)

    model.add(fc2)

  15. Import the RMSprop class from tensorflow.keras.optimizers:

    from tensorflow.keras.optimizers import RMSprop

  16. Instantiate an RMSprop optimizer with 0.00025 as the learning rate:

    optimizer=RMSprop(lr=0.00025)

  17. Compile the model by specifying mse as the loss function, RMSprop as optimizer, and accuracy as the metric to be displayed during training to the compile method:

    model.compile(loss='mse', optimizer=optimizer,

                  metrics=['accuracy'])

  18. Print a summary of the model using the summary method:

    model.summary()

    Following is the output of the code:

    Figure 10.5: Summary of the CNN model

Figure 10.5: Summary of the CNN model

The output shows the architecture of the model we just built, together with the different layers and the number of parameters that will be used during the training of the model.

Note

To access the source code for this specific section, please refer to https://packt.live/2YrqiiZ.

You can also run this example online at https://packt.live/3fiNMxE.

We have designed a CNN model with three convolutional layers. In the next section, we will see how we can use this model in relation to a DQN agent.

Combining a DQN with a CNN

Humans play video games using their sight. They look at the screen, analyze the situation, and decide what the best action to be performed is. In video games, there can be a lot of things happening on the screen, so being able to see all these patterns can give a significant advantage in playing the game. Combining a DQN with a CNN can help a reinforcement learning agent to learn the right action to take given a particular situation.

Instead of just using fully connected layers, a DQN model can be extended with convolutional layers as inputs. The model will then be able to analyze the input image, find the relevant patterns, and feed them to the fully connected layers responsible for predicting the Q-values, as shown in the following:

Figure 10.6: Difference between a normal DQN and a DQN combined 
with convolutional layers

Figure 10.6: Difference between a normal DQN and a DQN combined with convolutional layers

Adding convolutional layers helps the agent to better understand the environment. The DQN agent that we will build in the coming activity will use the CNN model from Exercise 10.02, Designing a CNN Model with TensorFlow, to output the Q-values for a given state. But rather than using a single model, we will use two models instead. The models will share the exact same architecture.

The first model will be responsible for predicting the Q-values for playing the game, while the second one (referred to as the target model) will be responsible for learning what should be the optimal Q-values. This technique helps the target model to converge faster on the optimal solution.

Activity 10.01: Training a DQN with CNNs to Play Breakout

In this activity, we will build a DQN with additional convolutional layers and train it to play the game Breakout with CNNs. We will add experience replay to the agent. We will need to preprocess the images in order to create a sequence of four images for our Breakout game.

The following instructions will help you to complete this activity:

  1. Import the relevant packages (gym, tensorflow, numpy).
  2. Reshape the training and test sets.
  3. Create a DQN class with the build_model() method, which will instantiate a CNN model composed of the get_action() method, which will apply the epsilon-greedy algorithm to choose the action to be played, the add_experience() method to store in memory the experience acquired by playing the game, the replay() method, which will perform experience replay by sampling experiences from the memory and train the DQN model, and the update_epsilon() method to gradually decrease the epsilon value for epsilon-greedy.
  4. Use the initialize_env() function to initialize the environment by returning the initial state, False for the done flag, and 0 as the initial reward.
  5. Create a function called preprocess_state() that will perform the following preprocessing on an image: crop the image to remove unnecessary parts, convert to a grayscale image, and resize the image to a square shape.
  6. Create a function called play_game() that will play a game until it is over, and then store the experience and the accumulated reward.
  7. Create a function called train_agent() that will iterate through a number of episodes where the agent will play a game and perform experience replay.
  8. Instantiate a Breakout environment and train a DQN agent to play this game for 50 episodes. Please note that it might take longer for this step to execute as we are training large models.

    The expected output will be close to the one shown here. You may have slightly different values on account of the randomness of the game and the randomness of the epsilon-greedy algorithm in choosing the action to be played:

    [Episode 0] - Average Score: 3.0

    Average Score: 0.59

    Note

    The solution to this activity can be found on page 752.

In the next section, we will see how we can extend this model with another type of deep learning architecture: the RNN.

RNNs in TensorFlow

In the previous section, we saw how to integrate a CNN into a DQN model to improve the performance of a reinforcement learning agent. We added a few convolutional layers as inputs to the fully connected layers of the DQN model. These convolutional layers helped the model to analyze visual patterns from the game environment and make better decisions.

There is a limitation, however, to using a traditional CNN approach. CNNs can only analyze a single image. While playing video games such as Breakout, analyzing a sequence of images is a much more powerful tool when it comes to understanding the movements of the ball. This is where RNNs come to the fore:

Figure 10.7: Sequencing of RNNs

Figure 10.7: Sequencing of RNNs

RNNs are a specific architecture of neural networks that take a sequence of inputs. They are very popular in natural language processing for treating corpora of texts for speech recognition, chatbots, or text translation. Texts can be defined as sequences of words that are correlated with one another. It is hard to determine the topic of a sentence or a paragraph just by looking at a single word. You have to look at a sequence of multiple words before being able to make a guess.

There are different types of RNN models. The most popular ones are Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM). Both of these models have a memory that keeps a record of the different inputs the model has already processed (for instance, the first five words of a sentence) and combines them with new inputs (such as the sixth word of a sentence).

In TensorFlow, we can build an LSTM layer of 10 units as follows:

from tensorflow.keras.layers import LSTM

LSTM(10, activation='tanh', recurrent_activation='sigmoid')

The sigmoid activation function is the most popular one used for RNN models.

The syntax will be very similar for defining a GRU layer:

from tensorflow.keras.layers import GRU

GRU(10, activation='tanh', recurrent_activation='sigmoid')

In Keras, we can use the Sequential() class to create a multi-layer LSTM:

import tensorflow as tf

from tensorflow.keras.layers import LSTM, Dense

model = tf.keras.Sequential()

model.add(LSTM(128, activation='tanh',

               recurrent_activation='sigmoid'))

model.add(Dense(units=100, activation="relu")))

model.add(Dense(units=10, activation="softmax"))

Before fitting the model, you will need to compile it by providing the loss function, the optimizer, and the metrics to be displayed:

model.compile(loss='sparse_categorical_crossentropy',

              optimizer="adam", metrics=['accuracy'])

We already saw how to define LSTM layers previously, but in order to combine them with a CNN model, we need to use a wrapper in TensorFlow called TimeDistributed(). This class is used to apply the same specified layer to each timestep of an input tensor, such as the following:

TimeDistributed(Dense(10))

In the preceding example, the same fully connected layer is applied to each of the timesteps received. In our case, we want to apply a convolutional layer to each image of a sequence before feeding an LSTM model. To build such a sequence, we will need to stack multiple images together to create a sequence that the RNN model will take as input. Let's now perform an exercise to design a combination of CNN and RNN models.

Exercise 10.03: Designing a Combination of CNN and RNN Models with TensorFlow

In this exercise, we will be designing a combination of CNN and RNN models with TensorFlow. This model will be used by our DRQN agent in Activity 10.02, Training a DRQN to Play Breakout, to play the game Breakout:

  1. Open a new Jupyter Notebook and import the tensorflow package:

    import tensorflow as tf

  2. Import the Sequential class from tensorflow.keras.models:

    from tensorflow.keras.models import Sequential

  3. Instantiate a sequential model and save it to a variable called model:

    model = Sequential()

  4. Import the Conv2D class from tensorflow.keras.layers:

    from tensorflow.keras.layers import Conv2D

  5. Instantiate a convolutional layer with Conv2D with 32 filters of size 8, a stride of 4 by 4, and relu as the activation function. Save it to a variable called conv1:

    conv1 = Conv2D(32, 8, (4,4), activation='relu',

                   padding='valid', input_shape=(84, 84, 1))

  6. Instantiate a second convolutional layer with Conv2D with 64 filters of size 4, a stride of 2 by 2, and relu as the activation function. Save it to a variable called conv2:

    conv2 = Conv2D(64, 4, (2,2), activation='relu',

                   padding='valid')

  7. Instantiate a third convolutional layer with Conv2D with 64 filters of size 3, a stride of 1 by 1, and relu as the activation function. Save it to a variable called conv3:

    conv3 = Conv2D(64, 3, (1,1), activation='relu',

                   padding='valid')

  8. Import the TimeDistributed class from tensorflow.keras.layers:

    from tensorflow.keras.layers import TimeDistributed

  9. Instantiate a time-distributed layer that will take conv1 as the input and (4, 84, 84, 1) as the input shape. Save it to a variable called time_conv1:

    time_conv1 = TimeDistributed(conv1, input_shape=(4, 84, 84, 1))

  10. Instantiate a second time-distributed layer that will take conv2 as the input. Save it to a variable called time_conv2:

    time_conv2 = TimeDistributed(conv2)

  11. Instantiate a third time-distributed layer that will take conv3 as the input. Save it to a variable called time_conv3:

    time_conv3 = TimeDistributed(conv3)

  12. Add the three time-distributed layers to the model using the add() method:

    model.add(time_conv1)

    model.add(time_conv2)

    model.add(time_conv3)

  13. Import the Flatten class from tensorflow.keras.layers:

    from tensorflow.keras.layers import Flatten

  14. Instantiate a time-distributed layer that will take a Flatten() layer as input. Save it to a variable called time_flatten:

    time_flatten = TimeDistributed(Flatten())

  15. Add the time_flatten layer to the model with the add() method:

    model.add(time_flatten)

  16. Import the LSTM class from tensorflow.keras.layers:

    from tensorflow.keras.layers import LSTM

  17. Instantiate an LSTM layer with 512 units. Save it to a variable called lstm:

    lstm = LSTM(512)

  18. Add the LSTM layer to the model with the add() method:

    model.add(lstm)

  19. Import the Dense class from tensorflow.keras.layers:

    from tensorflow.keras.layers import Dense

  20. Instantiate a fully connected layer with 128 units and relu as the activation function:

    fc1 = Dense(128, activation='relu')

  21. Instantiate a fully connected layer with 4 units:

    fc2 = Dense(4)

  22. Add the two fully connected layers to the model with the add() method:

    model.add(fc1)

    model.add(fc2)

  23. Import the RMSprop class from tensorflow.keras.optimizers:

    from tensorflow.keras.optimizers import RMSprop

  24. Instantiate RMSprop with 0.00025 as the learning rate:

    optimizer=RMSprop(lr=0.00025)

  25. Compile the model by specifying mse as the loss function, RMSprop as the optimizer, and accuracy as the metric to be displayed during training to the compile method:

    model.compile(loss='mse', optimizer=optimizer,

                  metrics=['accuracy'])

  26. Print a summary of the model using the summary method:

    model.summary()

    Following is the output of the code:

    Figure 10.8: Summary of the CNN+RNN model

Figure 10.8: Summary of the CNN+RNN model

We have successfully combined a CNN model with an RNN model. The preceding output shows the architecture of the model we just built with the different layers and the number of parameters that will be used during training. This model takes as input a sequence of four images and passes it to the RNN, which will analyze their relationship before feeding the results to the fully connected layers, which will be responsible for predicting the Q-values.

Note

To access the source code for this specific section, please refer to https://packt.live/2UDB3h4.

You can also run this example online at https://packt.live/3dVrf9T.

Now that we know how to build an RNN, we can combine this technique with a DQN model. This kind of model is called a DRQN, and this is what we are going to look at in the next section.

Building a DRQN

A DQN can benefit greatly from RNN models facilitating the processing of sequential images. Such an architecture is known as Deep Recurrent Q Network (DRQN). Combining a GRU or LSTM model with a CNN model will allow the reinforcement learning agent to understand the movement of the ball. To do so, we just need to add an LSTM (or GRU) layer between the convolutional and fully connected layers, as shown in the following figure:

Figure 10.9: DRQN architecture

Figure 10.9: DRQN architecture

To feed the RNN model with a sequence of images, we need to stack several images together. For the Breakout game, after initializing the environment, we will need to take the first image and duplicate it several times in order to have the first initial sequence of images. Having done this, after each action, we can append the latest image to the sequence and remove the oldest one in order to maintain the exact same size of sequence (for instance, a sequence of a maximum of four images).

Activity 10.02: Training a DRQN to Play Breakout

In this activity, we will build a DRQN model by replacing the DQN model from Activity 10.01, Training a DQN with CNNs to Play Breakout. We will then train the DRQN model to play the Breakout game and analyze the performance of the agent. The following instructions will help you to complete this activity:

  1. Import the relevant packages (gym, tensorflow, numpy).
  2. Reshape the training and test sets.
  3. Create the DRQN class with the following methods: the build_model() method to instantiate a CNN combined with an RNN model, the get_action() method to apply the epsilon-greedy algorithm to choose the action to be played, the add_experience() method to store in memory the experience acquired by playing the game, the replay() method, which will perform experience replay by sampling experiences from the memory and train the DRQN model with a callback to save the model every two episodes, and the update_epsilon() method to gradually decrease the epsilon value for epsilon-greedy.
  4. Use the initialize_env() function to train the agent, which will initialize the environment by returning the initial state, False for the done flag, and 0 as the initial reward.
  5. Create a function called preprocess_state() that will perform the following preprocessing on an image: crop the image to remove unnecessary parts, convert to a grayscale image, and then resize the image to a square shape.
  6. Create a function called combine_images() that will stack a sequence of images.
  7. Create a function called play_game() that will play a game until it is over, and then store the experience and the accumulated reward.
  8. Create a function called train_agent() that will iterate through a number of episodes where the agent will play a game and perform experience replay.
  9. Instantiate a Breakout environment and train a DRQN agent to play this game for 200 episodes.

    Note

    We recommend training for 200 (or 400) episodes in order to train the models properly and achieve good performance, but this may take a few hours depending on the system configuration. Alternatively, you can reduce the number of episodes, which will reduce the training time but will impact the performance of the agent.

The expected output will be close to the one shown here. You may have slightly different values on account of the randomness of the game and the randomness of the epsilon-greedy algorithm in choosing the action to be played:

[Episode 0] - Average Score: 0.0

[Episode 50] - Average Score: 0.43137254901960786

[Episode 100] - Average Score: 0.4

[Episode 150] - Average: 0.54

Average Score: 0.53

Note

The solution to this activity can be found on page 756.

In the next section, we will see how we can improve the performance of our model by adding an attention mechanism to DRQN and building a DARQN model.

Introduction to the Attention Mechanism and DARQN

In the previous section, we saw how adding an RNN model to a DQN helped to increase its performance. RNNs are known for handling sequential data such as temporal information. In our case, we used a combination of CNNs and RNNs to help our reinforcement learning agent to better understand sequences of images from the game.

However, RNN models do have some limitations when it comes to analyzing long sequences of input or output data. To overcome this situation, researchers have come up with a technique called attention, which is the principal technique behind a Deep Attention Recurrent Q-Network (DARQN). The DARQN model is the same as the DRQN model, with just an attention mechanism added to it. To better understand this concept, we will go through an example of its application: neural translation. Neural translation is the field of translating text from one language to another, such as translating Shakespeare's plays, which were written in English, into French.

Sequence-to-sequence models are the best fit for such a task. They comprise two components: an encoder and a decoder. Both of them are RNN models, such as an LSTM or GRU model. The encoder is responsible for processing a sequence of words from the input data (in our previous example, this would be a sentence of English words) and generates an encoded version called the context vector. The decoder will take this context vector as input and will predict the relevant output sequence (a sentence of French words, in our example):

Figure 10.10: Sequence-to-sequence model

Figure 10.10: Sequence-to-sequence model

The size of the context vector is fixed. It is an encoded version of the input sequence with only the relevant information. You can think of it as a summary of the input data. However, the set size of this vector limits the model in terms of retaining sufficient relevant information from long sequences. It will tend to "forget" the earlier elements of a sequence. But in the case of translation, the beginning of a sentence usually contains very important information, such as its subject.

The attention mechanism not only provides the decoder with the context vector, but also the previous states of the encoder. This enables the decoder to find relevant relationships between previous states, the context vector, and the desired output. This will help in our example to understand the relationship between two elements that are far away from one another in the input sequence:

Figure 10.11: Sequence-to-sequence model with an attention mechanism

Figure 10.11: Sequence-to-sequence model with an attention mechanism

TensorFlow provides an Attention class. It takes as input a tensor of shape [output, states]. It is better to use it by using the functional API, where each layer acts as a function that takes inputs and provides outputs as results. In this case, we can simply extract the output and states from a GRU layer and provide them as inputs for the attention layer:

from tensorflow.keras.layers import GRU, Attention

out, states = GRU(512, return_sequences=True,

                  return_state=True)(input)

att = Attention()([out, states])

To build a DARQN model, we just need to add this attention mechanism to a DRQN model.

Let's add this attention mechanism to our previous DRQN agent (in Activity 10.02, Training a DRQN to Play Breakout) and build a DARQN model in the next activity.

Activity 10.03: Training a DARQN to Play Breakout

In this activity, we will build a DARQN model by adding an attention mechanism to our previous DRQN from Activity 10.02, Training a DRQN to Play Breakout. We will then train the model to play the Breakout game and then analyze the performance of the agent. The following instructions will help you to complete this activity:

  1. Import the relevant packages (gym, tensorflow, and numpy).
  2. Reshape the training and test sets.
  3. Create a DARQN class with the following methods: the build_model() method, which will instantiate a CNN combined with an RNN model (similar to Exercise 10.03, Designing a Combination of CNN and RNN Models with TensorFlow); the get_action() method, which will apply the epsilon-greedy algorithm to choose the action to be played; the add_experience() method to store in memory the experience acquired by playing the game; the replay() method, which will perform experience replay by sampling experiences from the memory and train the DARQN model with a callback to save the model every two episodes; and the update_epsilon() method to gradually decrease the epsilon value for epsilon-greedy.
  4. Initialize the environment using the initialize_env() function by returning the initial state, False for the done flag, and 0 as the initial reward.
  5. Use the preprocess_state() function to perform the following preprocessing on an image: crop the image to remove unnecessary parts, convert to a grayscale image, and resize the image to a square shape.
  6. Create a function called combine_images() that will stack a sequence of images.
  7. Use the play_game() function to play a game until it is over, and then store the experience and the accumulated reward.
  8. Iterate through a number of episodes where the agent will play a game and perform experience replay using the train_agent() function.
  9. Instantiate a Breakout environment and train a DARQN agent to play this game for 400 episodes.

    Note

    We recommend training for 400 episodes in order to properly train the model and achieve good performance, but this may take a few hours depending on the system configuration. Alternatively, you can reduce the number of episodes, which will reduce the training time but will impact the performance of the agent.

The output will be close to what you see here. You may have slightly different values on account of the randomness of the game and the randomness of the epsilon-greedy algorithm in choosing the action to be played:

[Episode 0] - Average Score: 1.0

[Episode 50] - Average Score: 2.4901960784313726

[Episode 100] - Average Score: 3.92

[Episode 150] - Average Score: 7.37

[Episode 200] - Average Score: 7.76

[Episode 250] - Average Score: 7.91

[Episode 300] - Average Score: 10.33

[Episode 350] - Average Score: 10.94

Average Score: 10.83

Note

The solution to this activity can be found on page 761.

Summary

In this chapter, we learned how to combine deep learning techniques to a DQN model and train it to play the Atari game Breakout. We first looked at adding convolutional layers to the agent for processing screenshots from the game. This helped the agent to better understand the game environment.

We then took things a step further and added an RNN to the outputs of the CNN model. We created a sequence of images and fed it to an LSTM layer. This sequential model provided the DQN agent with the ability to "visualize" the direction of the ball. This kind of model is called a DRQN.

Finally, we used an attention mechanism and trained a DARQN model to play the Breakout game. This mechanism helped the model to better understand previous relevant states and improved its performance drastically. This field is still evolving as new deep learning techniques and models are designed, outperforming previous generations in the process.

In the next chapter, you will be introduced to policy-based methods and the actor-critic model, which consists of multiple models responsible for computing an action based on a state and calculating the Q-values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset