© Abhishek Nandy and Manisha Biswas  2018
Abhishek Nandy and Manisha BiswasReinforcement Learning https://doi.org/10.1007/978-1-4842-3285-9_5

5. Reinforcement Learning with Keras, TensorFlow, and ChainerRL

Abhishek Nandy and Manisha Biswas2
(1)
Rm HIG L-2/4, Bldg Swaranika Co-Opt HSG, Kolkata, West Bengal, India
(2)
North 24 Parganas, West Bengal, India
 
This chapter covers using Keras with Reinforcement Learning and defines how Keras can be used for Deep Q Learning as well.

What Is Keras?

Keras is an open source frontend library for neural networks. We can say that it works as a backbone for the neural network, as it has very good capabilities for forming activation functions. Keras can run different deep learning frameworks as the backend.
Keras runs with lots of deep learning frameworks. The way to change from one framework to another is to modify the keras.json file, which is located in the same directory where Keras is installed.
The backend parameter needs to change as follows:
{
"backend" : "tensorflow"
}
You can change the parameter from TensorFlow to another framework if you want.
In the JSON file, if you want to use it with Theano or CNTK, you can do so by changing the backend parameter.
The structure of a keras.json file looks like this:
{
    "image_data_format": "channels_last",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}
The flow of all the Keras frameworks is shown in Figure 5-1.
A454310_1_En_5_Fig1_HTML.jpg
Figure 5-1.
Keras and its modification with different frameworks

Using Keras for Reinforcement Learning

This section covers installing Keras and shows an example of Reinforcement Learning. You first need to install the dependencies.
The dependencies are as follows:
  • Python
  • Keras 1.0
  • Pygame
  • Scikit-image
Let’s start installing Keras 1.0. This example shows how to install Keras from the Anaconda environment:
conda install -c jaikumarm keras
It asks for permission to install the new packages. Choose yes to proceed, as shown in Figure 5-2.
A454310_1_En_5_Fig2_HTML.jpg
Figure 5-2.
The updates to be installed
When the package installation is successful and completed, you’ll see the information shown in Figure 5-3.
A454310_1_En_5_Fig3_HTML.jpg
Figure 5-3.
The package installation is complete
You can also install Keras in a different way too. This example shows you how to install it using pip3.
First, use sudo apt update as follows:
(universe) abhi@ubuntu:∼$ sudo apt-get update
Then install pip3 as follows:
sudo apt-get -y install python3-pip
Figure 5-4 shows the installation process.
A454310_1_En_5_Fig4_HTML.jpg
Figure 5-4.
Installing pip3
After the dependencies, you need to install Keras (see Figure 5-5):
(universe) abhi@ubuntu:∼$ sudo pip3 install keras
A454310_1_En_5_Fig5_HTML.jpg
Figure 5-5.
Installing Keras
We will check now if Keras uses the TensorFlow backend or not. From the terminal Anaconda environment you enabled first, you need to switch to Python mode.
If you get the following result importing Keras, that means everything is working (see Figure 5-6).
(universe) abhi@ubuntu:∼$ python
Python 3.5.3 |Anaconda custom (64-bit)| (default, Mar  6 2017, 11:58:13)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras
Using TensorFlow backend.
A454310_1_En_5_Fig6_HTML.jpg
Figure 5-6.
Keras with the TensorFlow backend

Using ChainerRL

This section covers ChainerRL and explains how to apply Reinforcement Learning using it. ChainerRL is a deep Reinforcement Learning library especially built with the help of the Chainer Framework. See Figure 5-7.
A454310_1_En_5_Fig7_HTML.jpg
Figure 5-7.
ChainerRL

Installing ChainerRL

We will install ChainerRL first from the terminal window. Figure 5-8 shows the Anaconda environment.
A454310_1_En_5_Fig8_HTML.jpg
Figure 5-8.
Activating the Anaconda environment
You can now install ChainerRL. To do so, type this command in the terminal:
pip install chainerrl
Figure 5-9 shows the result of the installation.
A454310_1_En_5_Fig9_HTML.jpg
Figure 5-9.
Installing ChainerRL
Now you can git clone the repo. Use this command to do so:
git clone https://github.com/chainer/chainerrl.git
Figure 5-10 shows the result.
A454310_1_En_5_Fig10_HTML.jpg
Figure 5-10.
Cloning ChainerRL
Then get inside the chainerrl folder, as shown in Figure 5-11.
A454310_1_En_5_Fig11_HTML.jpg
Figure 5-11.
Inside the chainerrl folder

Pipeline for Using ChainerRL

Since the library is based on Python, the obvious language of choice is Python. Follow these steps to set up ChainerRL:
  1. 1.
    Import the gym, numpy, and supportive chainerrl libraries.
    import chainer
    import chainer.functions as F
    import chainer.links as L
    import chainerrl
    import gym
    import numpy as np
    You have to model an environment so that you can use OpenAI Gym (see Figure 5-12). The environment has two spaces:
    • Observation space
    • Action space
    They must have two methods, reset and step .
    A454310_1_En_5_Fig12_HTML.jpg
    Figure 5-12.
    How ChainerRL uses state transitions
     
  2. 2.
    Take a simulation environment such as Cartpole-v0 from the OpenAI simulation environment.
    env = gym.make('CartPole-v0')
    print('observation space:', env.observation_space)
    print('action space:', env.action_space)
    obs = env.reset()
    env.render()
    print('initial observation:', obs)
    action = env.action_space.sample()
    obs, r, done, info = env.step(action)
    print('next observation:', obs)
    print('reward:', r)
    print('done:', done)
    print('info:', info)
     
  3. 3.
    Now define an agent that will run from interactions with the environment. Here, it’s the QFunction (chainer.Chain) class:
        def __init__(self, obs_size, n_actions, n_hidden_channels=50):
            super().__init__(
                l0=L.Linear(obs_size, n_hidden_channels),
                l1=L.Linear(n_hidden_channels, n_hidden_channels),
                l2=L.Linear(n_hidden_channels, n_actions))
        def __call__(self, x, test=False):
            """
            Args:
                x (ndarray or chainer.Variable): An observation
                test (bool): a flag indicating whether it is in test mode
            """
            h = F.tanh(self.l0(x))
            h = F.tanh(self.l1(h))
            return chainerrl.action_value.DiscreteActionValue(self.l2(h))
    obs_size = env.observation_space.shape[0]
    n_actions = env.action_space.n
    q_func = QFunction(obs_size, n_actions)
    we apply Q learning etc.
    We start with the Agent.
    gamma = 0.95
    # Use epsilon-greedy for exploration
    explorer = chainerrl.explorers.ConstantEpsilonGreedy(
        epsilon=0.3, random_action_func=env.action_space.sample)
    # DQN uses Experience Replay.
    # Specify a replay buffer and its capacity.
    replay_buffer = chainerrl.replay_buffer.ReplayBuffer(capacity=10 ** 6)
    # Since observations from CartPole-v0 is numpy.float64 while
    # Chainer only accepts numpy.float32 by default, specify
    # a converter as a feature extractor function phi.
    phi = lambda x: x.astype(np.float32, copy=False)
    # Now create an agent that will interact with the environment.
    agent = chainerrl.agents.DoubleDQN(
        q_func, optimizer, replay_buffer, gamma, explorer,
        replay_start_size=500, update_interval=1,
        target_update_interval=100, phi=phi)
     
  4. 4.
    Start the Reinforcement Learning process. You have to open the jupyter notebook first in the Universe environment, as shown in Figure 5-13.
    A454310_1_En_5_Fig13_HTML.jpg
    Figure 5-13.
    Getting inside jupyter notebook
    abhi@ubuntu:∼$ source activate universe
    (universe) abhi@ubuntu:∼$ jupyter notebook
    Figure 5-14 shows running the code on the final go.
    A454310_1_En_5_Fig14_HTML.jpg
    Figure 5-14.
    Running the code
     
  5. 5.
    Now you test the agents , as shown in Figure 5-15.
    A454310_1_En_5_Fig15_HTML.jpg
    Figure 5-15.
    Testing the agents
     
We completed the entire program in the jupyter notebook. Now we will work on one of the repos for understanding Deep Q Learning with TensorFlow. See Figure 5-16.
A454310_1_En_5_Fig16_HTML.jpg
Figure 5-16.
Cloning the GitHub repo
First you need to install the prerequisites as follows (see Figure 5-17):
pip install -U 'gym[all]' tqdm scipy
A454310_1_En_5_Fig17_HTML.jpg
Figure 5-17.
Getting inside the folder
Then run the program and train it without using GPU support, as shown in Figure 5-18.
A454310_1_En_5_Fig18_HTML.jpg
Figure 5-18.
Training the program without GPU support
The command is as follows:
$ python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=False
The command uses the main.py Python file and runs the Breakout game simulation in CPU mode only. You can now open the terminal to get inside the Anaconda environment, as shown in Figure 5-19.
A454310_1_En_5_Fig19_HTML.jpg
Figure 5-19.
Activating the environment
Now switch to Python mode, as shown in Figure 5-20:
(universe) abhi@ubuntu:∼$ python
Python 3.5.3 |Anaconda custom (64-bit)| (default, Mar  6 2017, 11:58:13)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
A454310_1_En_5_Fig20_HTML.jpg
Figure 5-20.
Switching to Python mode
As you switch to Python mode, you first import the utilities:
import gym
import numpy as np
To get the observation along the frozen lake simulation, you have to formulate the Q table as follows:
Q = np.zeros([env.observation_space.n,env.action_space.n])
After that, you declare the learning rates and create the lists to contain the rewards for each state.
import gym
import numpy as np
env = gym.make('FrozenLake-v0')
#Initialize table with all zeros
Q = np.zeros([env.observation_space.n,env.action_space.n])
# Set learning parameters
lr = .8
y = .95
num_episodes = 2000
#create lists to contain total rewards and steps per episode
#jList = []
rList = []
for i in range(num_episodes):
    #Reset environment and get first new observation
    s = env.reset()
    rAll = 0
    d = False
    j = 0
    #The Q-Table learning algorithm
    while j < 99:
        j+=1
        #Choose an action by greedily (with noise) picking from Q table
        a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))
        #Get new state and reward from environment
        s1,r,d,_ = env.step(a)
        #Update Q-Table with new knowledge
        Q[s,a] = Q[s,a] + lr*(r + y*np.max(Q[s1,:]) - Q[s,a])
        rAll += r
        s = s1
        if d == True:
            break
    #jList.append(j)
    rList.append(rAll)
print "Score over time: " +  str(sum(rList)/num_episodes)
print "Final Q-Table Values"
print Q
After going through all the steps, you can finally print the Q table. Each line should be placed into Python mode .

Deep Q Learning: Using Keras and TensorFlow

We will touch on Deep Q Learning with Keras. We will clone an important reinforcement library, which is known as Keras-rl. It has several states of the Deep Q Learning algorithms. See Figure 5-21.
A454310_1_En_5_Fig21_HTML.jpg
Figure 5-21.
Keras-rl representation

Installing Keras-rl

The command for installing Keras-rl is as follows (see Figure 5-22):
pip install keras-rl
A454310_1_En_5_Fig22_HTML.jpg
Figure 5-22.
Installing Keras-rl
You also need to install h5py if it is not already installed and then you need to clone the repo, as shown in Figure 5-23.
A454310_1_En_5_Fig23_HTML.jpg
Figure 5-23.
Cloning the git repo

Training with Keras-rl

You will see how to run a program in this section. First, get inside the rl folder, as shown in Figure 5-24.
abhi@ubuntu:∼$ cd keras-rl
abhi@ubuntu:∼/keras-rl$ dir
assets  examples           LICENSE     pytest.ini  rl         setup.py
docs    ISSUE_TEMPLATE.md  mkdocs.yml  README.md   setup.cfg  tests
abhi@ubuntu:∼/keras-rl$ cd examples
abhi@ubuntu:∼/keras-rl/examples$ dir
cem_cartpole.py   dqn_atari.py     duel_dqn_cartpole.py  sarsa_cartpole.py
ddpg_pendulum.py  dqn_cartpole.py  naf_pendulum.py      visualize_log.py
abhi@ubuntu:∼/keras-rl/examples$
A454310_1_En_5_Fig24_HTML.jpg
Figure 5-24.
Getting inside the Keras-rl directory
Now you can run one of the examples:
abhi@ubuntu:∼/keras-rl/examples$ python dqn_cartpole.py
Activating the anaconda environment
(universe) abhi@ubuntu:∼/keras-rl/examples$ python dqn_cartpole.py
See Figure 5-25.
A454310_1_En_5_Fig25_HTML.jpg
Figure 5-25.
Using the TensorFlow backend
The simulation will now begin, as shown in Figure 5-26.
A454310_1_En_5_Fig26_HTML.jpg
Figure 5-26.
Simulation happens
The simulation occurs and trains the model using Deep Q Learning. With practice, the cart-pole will balance along the rope; its stability increases with learning .
The entire process creates the following log:
(universe) abhi@ubuntu:∼/keras-rl/examples$ python dqn_cartpole.py
Using TensorFlow backend.
[2017-09-24 09:36:27,476] Making new env: CartPole-v0
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
flatten_1 (Flatten)          (None, 4)                 0
_________________________________________________________________
dense_1 (Dense)              (None, 16)                80
_________________________________________________________________
activation_1 (Activation)    (None, 16)                0
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272
_________________________________________________________________
activation_2 (Activation)    (None, 16)                0
_________________________________________________________________
dense_3 (Dense)              (None, 16)                272
_________________________________________________________________
activation_3 (Activation)    (None, 16)                0
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 34
_________________________________________________________________
activation_4 (Activation)    (None, 2)                 0
=================================================================
Total params: 658
Trainable params: 658
Non-trainable params: 0
_________________________________________________________________
None
2017-09-24 09:36:27.932219: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
...
   712/50000: episode: 38, duration: 0.243s, episode steps: 14, steps per second: 58, episode reward: 14.000, mean reward: 1.000 [1.000, 1.000], mean action: 0.500 [0.000, 1.000], mean observation: 0.105 [-0.568, 0.957], loss: 0.291389, mean_absolute_error: 3.054634, mean_q: 5.816398
The episodes are iterations for the simulations. The cartpole.py code is discussed next. You need to import the utilities first. The utilities included are very useful, as they have built-in agents for applying Deep Q Learning .
First, declare the environment as follows:
ENV_NAME = 'CartPole-v0'
env = gym.make(ENV_NAME)
Since we want to implement Deep Q Learning, we use parameters for initializing the Convolution Neural Network (CNN). We also use an activation function to propagate the neural network. We keep it sequential.
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
You can print the model details too, as follows:
print(model.summary())
Next, configure the model and use all the Reinforcement Learning options with the help of a function.
import numpy as np
import gym
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory
ENV_NAME = 'CartPole-v0'
# Get the environment and extract the number of actions.
env = gym.make(ENV_NAME)
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n
# Next, we build a very simple model.
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())
# Finally, we configure and compile our agent. You can use every built-in Keras optimizer and
# even the metrics!
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
# Okay, now it's time to learn something! We visualize the training here for show, but this
# slows down training quite a lot. You can always safely abort the training prematurely using
# Ctrl + C.
dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)
# After training is done, we save the final weights.
dqn.save_weights('dqn_{}_weights.h5f'.format(ENV_NAME), overwrite=True)
# Finally, evaluate our algorithm for 5 episodes.
dqn.test(env, nb_episodes=5, visualize=True)
To get all the capabilities of Keras-rl, you need to run the setup.py file within the Keras-rl folder, as follows:
(universe) abhi@ubuntu:∼/keras-rl$ python setup.py install
You will see that all the dependencies are being installed, one by one:
running install
running bdist_egg
running egg_info
creating keras_rl.egg-info
writing requirements to keras_rl.egg-info/requires.txt
writing dependency_links to keras_rl.egg-info/dependency_links.txt
writing top-level names to keras_rl.egg-info/top_level.txt
writing keras_rl.egg-info/PKG-INFO
writing manifest file 'keras_rl.egg-info/SOURCES.txt'
reading manifest file 'keras_rl.egg-info/SOURCES.txt'
writing manifest file 'keras_rl.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/tests
copying tests/__init__.py -> build/lib/tests
creating build/lib/rl
copying rl/util.py -> build/lib/rl
copying rl/callbacks.py -> build/lib/rl
copying rl/keras_future.py -> build/lib/rl
copying rl/memory.py -> build/lib/rl
copying rl/random.py -> build/lib/rl
copying rl/core.py -> build/lib/rl
copying rl/__init__.py -> build/lib/rl
copying rl/policy.py -> build/lib/rl
creating build/lib/tests/rl
copying tests/rl/test_util.py -> build/lib/tests/rl
copying tests/rl/util.py -> build/lib/tests/rl
copying tests/rl/test_memory.py -> build/lib/tests/rl
copying tests/rl/test_core.py -> build/lib/tests/rl
copying tests/rl/__init__.py -> build/lib/tests/rl
creating build/lib/tests/rl/agents
copying tests/rl/agents/test_cem.py -> build/lib/tests/rl/agents
copying tests/rl/agents/__init__.py -> build/lib/tests/rl/agents
copying tests/rl/agents/test_ddpg.py -> build/lib/tests/rl/agents
copying tests/rl/agents/test_dqn.py -> build/lib/tests/rl/agents
creating build/lib/rl/agents
copying rl/agents/sarsa.py -> build/lib/rl/agents
copying rl/agents/ddpg.py -> build/lib/rl/agents
copying rl/agents/dqn.py -> build/lib/rl/agents
copying rl/agents/cem.py -> build/lib/rl/agents
copying rl/agents/__init__.py -> build/lib/rl/agents
Keras-rl is now set up and you can use the built-in functions to their fullest effect.

Conclusion

This chapter introduced and defined Keras and explained how to use it with Reinforcement Learning. The chapter also explained how to use TensorFlow with Reinforcement Learning and discussed using ChainerRL. Chapter 6 covers Google DeepMind and the future of Reinforcement Learning.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset