Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Abhishek Nandy and Manisha BiswasReinforcement Learning https://doi.org/10.1007/978-1-4842-3285-9_5

5. Reinforcement Learning with Keras, TensorFlow, and ChainerRL

Abhishek Nandy¹ and Manisha Biswas²

(1)

Rm HIG L-2/4, Bldg Swaranika Co-Opt HSG, Kolkata, West Bengal, India

(2)

North 24 Parganas, West Bengal, India

This chapter covers using Keras with Reinforcement Learning and defines how Keras can be used for Deep Q Learning as well.

What Is Keras?

Keras is an open source frontend library for neural networks. We can say that it works as a backbone for the neural network, as it has very good capabilities for forming activation functions. Keras can run different deep learning frameworks as the backend.

Keras runs with lots of deep learning frameworks. The way to change from one framework to another is to modify the keras.json file, which is located in the same directory where Keras is installed.

The backend parameter needs to change as follows:

{

"backend" : "tensorflow"

}

You can change the parameter from TensorFlow to another framework if you want.

In the JSON file, if you want to use it with Theano or CNTK, you can do so by changing the backend parameter.

The structure of a keras.json file looks like this:

{

"image_data_format": "channels_last",

"epsilon": 1e-07,

"floatx": "float32",

"backend": "tensorflow"

}

The flow of all the Keras frameworks is shown in Figure 5-1.

Figure 5-1.

Keras and its modification with different frameworks

Using Keras for Reinforcement Learning

This section covers installing Keras and shows an example of Reinforcement Learning. You first need to install the dependencies.

The dependencies are as follows:

Python
Keras 1.0
Pygame
Scikit-image

Let’s start installing Keras 1.0. This example shows how to install Keras from the Anaconda environment:

conda install -c jaikumarm keras

It asks for permission to install the new packages. Choose yes to proceed, as shown in Figure 5-2.

Figure 5-2.

The updates to be installed

When the package installation is successful and completed, you’ll see the information shown in Figure 5-3.

Figure 5-3.

The package installation is complete

You can also install Keras in a different way too. This example shows you how to install it using pip3.

First, use sudo apt update as follows:

(universe) abhi@ubuntu:∼$ sudo apt-get update

Then install pip3 as follows:

sudo apt-get -y install python3-pip

Figure 5-4 shows the installation process.

Figure 5-4.

Installing pip3

After the dependencies, you need to install Keras (see Figure 5-5):

(universe) abhi@ubuntu:∼$ sudo pip3 install keras

Figure 5-5.

Installing Keras

We will check now if Keras uses the TensorFlow backend or not. From the terminal Anaconda environment you enabled first, you need to switch to Python mode.

If you get the following result importing Keras, that means everything is working (see Figure 5-6).

(universe) abhi@ubuntu:∼$ python

Python 3.5.3 |Anaconda custom (64-bit)| (default, Mar 6 2017, 11:58:13)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import keras

Using TensorFlow backend.

Figure 5-6.

Keras with the TensorFlow backend

Using ChainerRL

This section covers ChainerRL and explains how to apply Reinforcement Learning using it. ChainerRL is a deep Reinforcement Learning library especially built with the help of the Chainer Framework. See Figure 5-7.

Figure 5-7.

ChainerRL

Installing ChainerRL

We will install ChainerRL first from the terminal window. Figure 5-8 shows the Anaconda environment.

Figure 5-8.

Activating the Anaconda environment

You can now install ChainerRL. To do so, type this command in the terminal:

pip install chainerrl

Figure 5-9 shows the result of the installation.

Figure 5-9.

Installing ChainerRL

Now you can git clone the repo. Use this command to do so:

git clone https://github.com/chainer/chainerrl.git

Figure 5-10 shows the result.

Figure 5-10.

Cloning ChainerRL

Then get inside the chainerrl folder, as shown in Figure 5-11.

Figure 5-11.

Inside the chainerrl folder

Pipeline for Using ChainerRL

Since the library is based on Python, the obvious language of choice is Python. Follow these steps to set up ChainerRL:

1.
Import the gym, numpy, and supportive chainerrl libraries.

import chainer
import chainer.functions as F
import chainer.links as L
import chainerrl
import gym
import numpy as np
You have to model an environment so that you can use OpenAI Gym (see Figure 5-12). The environment has two spaces:
Observation space
Action space
They must have two methods, reset and step .
Figure 5-12.
How ChainerRL uses state transitions
2.
Take a simulation environment such as Cartpole-v0 from the OpenAI simulation environment.

env = gym.make('CartPole-v0')
print('observation space:', env.observation_space)
print('action space:', env.action_space)
obs = env.reset()
env.render()
print('initial observation:', obs)
action = env.action_space.sample()
obs, r, done, info = env.step(action)
print('next observation:', obs)
print('reward:', r)
print('done:', done)
print('info:', info)
3.
Now define an agent that will run from interactions with the environment. Here, it’s the QFunction (chainer.Chain) class:

    def __init__(self, obs_size, n_actions, n_hidden_channels=50):
        super().__init__(
            l0=L.Linear(obs_size, n_hidden_channels),
            l1=L.Linear(n_hidden_channels, n_hidden_channels),
            l2=L.Linear(n_hidden_channels, n_actions))
    def __call__(self, x, test=False):
        """
        Args:
            x (ndarray or chainer.Variable): An observation
            test (bool): a flag indicating whether it is in test mode
        """
        h = F.tanh(self.l0(x))
        h = F.tanh(self.l1(h))
        return chainerrl.action_value.DiscreteActionValue(self.l2(h))
obs_size = env.observation_space.shape[0]
n_actions = env.action_space.n
q_func = QFunction(obs_size, n_actions)
we apply Q learning etc.
We start with the Agent.
gamma = 0.95
# Use epsilon-greedy for exploration
explorer = chainerrl.explorers.ConstantEpsilonGreedy(
    epsilon=0.3, random_action_func=env.action_space.sample)
# DQN uses Experience Replay.
# Specify a replay buffer and its capacity.
replay_buffer = chainerrl.replay_buffer.ReplayBuffer(capacity=10 ** 6)
# Since observations from CartPole-v0 is numpy.float64 while
# Chainer only accepts numpy.float32 by default, specify
# a converter as a feature extractor function phi.
phi = lambda x: x.astype(np.float32, copy=False)
# Now create an agent that will interact with the environment.
agent = chainerrl.agents.DoubleDQN(
    q_func, optimizer, replay_buffer, gamma, explorer,
    replay_start_size=500, update_interval=1,
    target_update_interval=100, phi=phi)
4.
Start the Reinforcement Learning process. You have to open the jupyter notebook first in the Universe environment, as shown in Figure 5-13.
Figure 5-13.
Getting inside jupyter notebook

abhi@ubuntu:∼$ source activate universe
(universe) abhi@ubuntu:∼$ jupyter notebook

Figure 5-14 shows running the code on the final go.
Figure 5-14.
Running the code
5.
Now you test the agents , as shown in Figure 5-15.
Figure 5-15.
Testing the agents

We completed the entire program in the jupyter notebook. Now we will work on one of the repos for understanding Deep Q Learning with TensorFlow. See Figure 5-16.

Figure 5-16.

Cloning the GitHub repo

First you need to install the prerequisites as follows (see Figure 5-17):

pip install -U 'gym[all]' tqdm scipy

Figure 5-17.

Getting inside the folder

Then run the program and train it without using GPU support, as shown in Figure 5-18.

Figure 5-18.

Training the program without GPU support

The command is as follows:

$ python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=False

The command uses the main.py Python file and runs the Breakout game simulation in CPU mode only. You can now open the terminal to get inside the Anaconda environment, as shown in Figure 5-19.

Figure 5-19.

Activating the environment

Now switch to Python mode, as shown in Figure 5-20:

(universe) abhi@ubuntu:∼$ python

Python 3.5.3 |Anaconda custom (64-bit)| (default, Mar 6 2017, 11:58:13)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>>

Figure 5-20.

Switching to Python mode

As you switch to Python mode, you first import the utilities:

import gym

import numpy as np

To get the observation along the frozen lake simulation, you have to formulate the Q table as follows:

Q = np.zeros([env.observation_space.n,env.action_space.n])

After that, you declare the learning rates and create the lists to contain the rewards for each state.

import gym

import numpy as np

env = gym.make('FrozenLake-v0')

#Initialize table with all zeros

Q = np.zeros([env.observation_space.n,env.action_space.n])

# Set learning parameters

lr = .8

y = .95

num_episodes = 2000

#create lists to contain total rewards and steps per episode

#jList = []

rList = []

for i in range(num_episodes):

#Reset environment and get first new observation

s = env.reset()

rAll = 0

d = False

j = 0

#The Q-Table learning algorithm

while j < 99:

j+=1

#Choose an action by greedily (with noise) picking from Q table

a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))

#Get new state and reward from environment

s1,r,d,_ = env.step(a)

#Update Q-Table with new knowledge

Q[s,a] = Q[s,a] + lr*(r + y*np.max(Q[s1,:]) - Q[s,a])

rAll += r

s = s1

if d == True:

break

#jList.append(j)

rList.append(rAll)

print "Score over time: " + str(sum(rList)/num_episodes)

print "Final Q-Table Values"

print Q

After going through all the steps, you can finally print the Q table. Each line should be placed into Python mode .

Deep Q Learning: Using Keras and TensorFlow

We will touch on Deep Q Learning with Keras. We will clone an important reinforcement library, which is known as Keras-rl. It has several states of the Deep Q Learning algorithms. See Figure 5-21.

Figure 5-21.

Keras-rl representation

Installing Keras-rl

The command for installing Keras-rl is as follows (see Figure 5-22):

pip install keras-rl

Figure 5-22.

Installing Keras-rl

You also need to install h5py if it is not already installed and then you need to clone the repo, as shown in Figure 5-23.

Figure 5-23.

Cloning the git repo

Training with Keras-rl

You will see how to run a program in this section. First, get inside the rl folder, as shown in Figure 5-24.

abhi@ubuntu:∼$ cd keras-rl

abhi@ubuntu:∼/keras-rl$ dir

assets examples LICENSE pytest.ini rl setup.py

docs ISSUE_TEMPLATE.md mkdocs.yml README.md setup.cfg tests

abhi@ubuntu:∼/keras-rl$ cd examples

abhi@ubuntu:∼/keras-rl/examples$ dir

cem_cartpole.py dqn_atari.py duel_dqn_cartpole.py sarsa_cartpole.py

ddpg_pendulum.py dqn_cartpole.py naf_pendulum.py visualize_log.py

abhi@ubuntu:∼/keras-rl/examples$

Figure 5-24.

Getting inside the Keras-rl directory

Now you can run one of the examples:

abhi@ubuntu:∼/keras-rl/examples$ python dqn_cartpole.py

Activating the anaconda environment

(universe) abhi@ubuntu:∼/keras-rl/examples$ python dqn_cartpole.py

See Figure 5-25.

Figure 5-25.

Using the TensorFlow backend

The simulation will now begin, as shown in Figure 5-26.

Figure 5-26.

Simulation happens

The simulation occurs and trains the model using Deep Q Learning. With practice, the cart-pole will balance along the rope; its stability increases with learning .

The entire process creates the following log:

(universe) abhi@ubuntu:∼/keras-rl/examples$ python dqn_cartpole.py

Using TensorFlow backend.

[2017-09-24 09:36:27,476] Making new env: CartPole-v0

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

flatten_1 (Flatten) (None, 4) 0

_________________________________________________________________

dense_1 (Dense) (None, 16) 80

_________________________________________________________________

activation_1 (Activation) (None, 16) 0

_________________________________________________________________

dense_2 (Dense) (None, 16) 272

_________________________________________________________________

activation_2 (Activation) (None, 16) 0

_________________________________________________________________

dense_3 (Dense) (None, 16) 272

_________________________________________________________________

activation_3 (Activation) (None, 16) 0

_________________________________________________________________

dense_4 (Dense) (None, 2) 34

_________________________________________________________________

activation_4 (Activation) (None, 2) 0

=================================================================

Total params: 658

Trainable params: 658

Non-trainable params: 0

_________________________________________________________________

None

2017-09-24 09:36:27.932219: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.

...

712/50000: episode: 38, duration: 0.243s, episode steps: 14, steps per second: 58, episode reward: 14.000, mean reward: 1.000 [1.000, 1.000], mean action: 0.500 [0.000, 1.000], mean observation: 0.105 [-0.568, 0.957], loss: 0.291389, mean_absolute_error: 3.054634, mean_q: 5.816398

The episodes are iterations for the simulations. The cartpole.py code is discussed next. You need to import the utilities first. The utilities included are very useful, as they have built-in agents for applying Deep Q Learning .

First, declare the environment as follows:

ENV_NAME = 'CartPole-v0'

env = gym.make(ENV_NAME)

Since we want to implement Deep Q Learning, we use parameters for initializing the Convolution Neural Network (CNN). We also use an activation function to propagate the neural network. We keep it sequential.

model = Sequential()

model.add(Flatten(input_shape=(1,) + env.observation_space.shape))

model.add(Dense(16))

model.add(Activation('relu'))

model.add(Dense(16))

model.add(Activation('relu'))

model.add(Dense(16))

model.add(Activation('relu'))

model.add(Dense(nb_actions))

model.add(Activation('linear'))

You can print the model details too, as follows:

print(model.summary())

Next, configure the model and use all the Reinforcement Learning options with the help of a function.

import numpy as np

import gym

from keras.models import Sequential

from keras.layers import Dense, Activation, Flatten

from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent

from rl.policy import BoltzmannQPolicy

from rl.memory import SequentialMemory

ENV_NAME = 'CartPole-v0'

# Get the environment and extract the number of actions.

env = gym.make(ENV_NAME)

np.random.seed(123)

env.seed(123)

nb_actions = env.action_space.n

# Next, we build a very simple model.

model = Sequential()

model.add(Flatten(input_shape=(1,) + env.observation_space.shape))

model.add(Dense(16))

model.add(Activation('relu'))

model.add(Dense(16))

model.add(Activation('relu'))

model.add(Dense(16))

model.add(Activation('relu'))

model.add(Dense(nb_actions))

model.add(Activation('linear'))

print(model.summary())

# Finally, we configure and compile our agent. You can use every built-in Keras optimizer and

# even the metrics!

memory = SequentialMemory(limit=50000, window_length=1)

policy = BoltzmannQPolicy()

dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,

target_model_update=1e-2, policy=policy)

dqn.compile(Adam(lr=1e-3), metrics=['mae'])

# Okay, now it's time to learn something! We visualize the training here for show, but this

# slows down training quite a lot. You can always safely abort the training prematurely using

# Ctrl + C.

dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)

# After training is done, we save the final weights.

dqn.save_weights('dqn_{}_weights.h5f'.format(ENV_NAME), overwrite=True)

# Finally, evaluate our algorithm for 5 episodes.

dqn.test(env, nb_episodes=5, visualize=True)

To get all the capabilities of Keras-rl, you need to run the setup.py file within the Keras-rl folder, as follows:

(universe) abhi@ubuntu:∼/keras-rl$ python setup.py install

You will see that all the dependencies are being installed, one by one:

running install

running bdist_egg

running egg_info

creating keras_rl.egg-info

writing requirements to keras_rl.egg-info/requires.txt

writing dependency_links to keras_rl.egg-info/dependency_links.txt

writing top-level names to keras_rl.egg-info/top_level.txt

writing keras_rl.egg-info/PKG-INFO

writing manifest file 'keras_rl.egg-info/SOURCES.txt'

reading manifest file 'keras_rl.egg-info/SOURCES.txt'

writing manifest file 'keras_rl.egg-info/SOURCES.txt'

installing library code to build/bdist.linux-x86_64/egg

running install_lib

running build_py

creating build

creating build/lib

creating build/lib/tests

copying tests/__init__.py -> build/lib/tests

creating build/lib/rl

copying rl/util.py -> build/lib/rl

copying rl/callbacks.py -> build/lib/rl

copying rl/keras_future.py -> build/lib/rl

copying rl/memory.py -> build/lib/rl

copying rl/random.py -> build/lib/rl

copying rl/core.py -> build/lib/rl

copying rl/__init__.py -> build/lib/rl

copying rl/policy.py -> build/lib/rl

creating build/lib/tests/rl

copying tests/rl/test_util.py -> build/lib/tests/rl

copying tests/rl/util.py -> build/lib/tests/rl

copying tests/rl/test_memory.py -> build/lib/tests/rl

copying tests/rl/test_core.py -> build/lib/tests/rl

copying tests/rl/__init__.py -> build/lib/tests/rl

creating build/lib/tests/rl/agents

copying tests/rl/agents/test_cem.py -> build/lib/tests/rl/agents

copying tests/rl/agents/__init__.py -> build/lib/tests/rl/agents

copying tests/rl/agents/test_ddpg.py -> build/lib/tests/rl/agents

copying tests/rl/agents/test_dqn.py -> build/lib/tests/rl/agents

creating build/lib/rl/agents

copying rl/agents/sarsa.py -> build/lib/rl/agents

copying rl/agents/ddpg.py -> build/lib/rl/agents

copying rl/agents/dqn.py -> build/lib/rl/agents

copying rl/agents/cem.py -> build/lib/rl/agents

copying rl/agents/__init__.py -> build/lib/rl/agents

Keras-rl is now set up and you can use the built-in functions to their fullest effect.

Conclusion

This chapter introduced and defined Keras and explained how to use it with Reinforcement Learning. The chapter also explained how to use TensorFlow with Reinforcement Learning and discussed using ChainerRL. Chapter 6 covers Google DeepMind and the future of Reinforcement Learning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Reinforcement Learning with Keras, TensorFlow, and ChainerRL

Create new playlist

Sign In

Sign Up

5. Reinforcement Learning with Keras, TensorFlow, and ChainerRL

What Is Keras?

Using Keras for Reinforcement Learning

Using ChainerRL

Installing ChainerRL

Pipeline for Using ChainerRL

Deep Q Learning: Using Keras and TensorFlow

Installing Keras-rl

Training with Keras-rl

Conclusion

Table of Contents for
5. Reinforcement Learning with Keras, TensorFlow, and ChainerRL