© Abhishek Nandy and Manisha Biswas  2018
Abhishek Nandy and Manisha BiswasReinforcement Learning https://doi.org/10.1007/978-1-4842-3285-9_3

3. OpenAI Basics

Abhishek Nandy and Manisha Biswas2
(1)
Rm HIG L-2/4, Bldg Swaranika Co-Opt HSG, Kolkata, West Bengal, India
(2)
North 24 Parganas, West Bengal, India
 
This chapter introduces the world of OpenAI and uses it in relation to Reinforcement Learning.
First, we go through environments that are important to Reinforcement Learning. We talk about two supportive platforms that are useful for Reinforcement Learning—Google DeepMind and OpenAI, the latter of which is supported by Elon Musk. The completely open sourced OpenAI is discussed in this chapter and Google DeepMind is discussed in Chapter 6.
The chapter first covers OpenAI basics and then moves toward describing them and discusses the OpenAI Gym and OpenAI Universe environments. Then we cover installing OpenAI Gym and OpenAI Universe on the Ubuntu and Anaconda distributions. Finally, we discuss using OpenAI Gym and OpenAI Universe for the purpose of Reinforcement Learning.

Getting to Know OpenAI

To start, you need to access the OpenAI web site at https://openai.com/ .
The web site is shown in Figure 3-1.
A454310_1_En_3_Fig1_HTML.jpg
Figure 3-1.
The OpenAI web site
The OpenAI web site is full of content and resources. It has lots of resources for you to learn and research accordingly. Let’s see schematically how OpenAI Gym and OpenAI Universe are connected. See Figure 3-2.
A454310_1_En_3_Fig2_HTML.jpg
Figure 3-2.
OpenAI Gym and OpenAI Universe
Figure 3-2 shows how OpenAI Gym and OpenAI Universe are connected, by using their icons.
The OpenAI Gym page of the web site is shown in Figure 3-3.
A454310_1_En_3_Fig3_HTML.jpg
Figure 3-3.
OpenAI Gym web site
OpenAI Gym is a toolkit that helps you run simulation games and scenarios to apply Reinforcement Learning as well as to apply Reinforcement Learning algorithms. It supports teaching agents for doing lots of activities, such as playing, walking, etc.
The OpenAI Universe web site is shown in Figure 3-4.
A454310_1_En_3_Fig4_HTML.jpg
Figure 3-4.
The OpenAI Universe web site
OpenAI Universe is a software platform that measures and trains an AI’s general intelligence across different kinds of games and applications.

Installing OpenAI Gym and OpenAI Universe

In this section, you learn how to install OpenAI Gym and OpenAI Universe in an Ubuntu machine using version 16.04.
Go into the Anaconda environment to install OpenAI Gym from GitHub. See Figure 3-5.
You can clone and install OpenAI Gym from GitHub using this command:
$ source activate universe
(universe) $ cd ∼
(universe) $ git clone https://github.com/openai/gym.git
(universe) $ cd gym
(universe) $ pip install -e '.[all]'
A454310_1_En_3_Fig5_HTML.jpg
Figure 3-5.
Cloning OpenAI Gym
Now install OpenAI Universe as follows:
(universe) $ cd ∼
(universe) $ git clone https://github.com/openai/universe.git
(universe) $ cd universe
(universe) $ pip install -e
The packages are being installed. Figure 3-6 shows the cloning process for OpenAI Universe.
A454310_1_En_3_Fig6_HTML.jpg
Figure 3-6.
Cloning OpenAI Universe
The entire process, with all the important files, is downloaded, as shown in Figure 3-7.
A454310_1_En_3_Fig7_HTML.jpg
Figure 3-7.
Important steps of the installation process
The process installation continues, as shown in Figure 3-8.
A454310_1_En_3_Fig8_HTML.jpg
Figure 3-8.
More steps of the installation process
In the next section, you learn how to start working in the OpenAI Gym and OpenAI environment.

Working with OpenAI Gym and OpenAI

The OpenAI cycle for a sample process is shown in Figure 3-9.
A454310_1_En_3_Fig9_HTML.jpg
Figure 3-9.
The basic OpenAI Gym structure
The process works this way. We are dealing with a simple Gym project. The language of choice here is Python, but we are more focused on the logic of how an environment is being utilized.
  1. 1.
    We import the Gym library.
     
  2. 2.
    We create an instance of the simulation to perform using the make function.
     
  3. 3.
    We reset the simulation so that the condition that we are going to apply can be realized.
     
  4. 4.
    We do looping and then render.
     
The output is a simulated result of the environment using OpenAI Reinforcement Learning techniques.
The program using Python is shown here, whereby we are using the cart-pole simulation example:
import gym
 env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
The program that we created runs from the terminal; we can also run the program on a jupyter notebook. Jupyter notebook is a special place where you can run Python code very easily.
To use the properties or the file structure of OpenAI, you need to be in the universe directory, as shown in Figure 3-10.
A454310_1_En_3_Fig10_HTML.jpg
Figure 3-10.
Inside the universe directory
To work with the Gym components, you need to get inside the gym directory, as shown in Figure 3-11.
A454310_1_En_3_Fig11_HTML.jpg
Figure 3-11.
Inside the gym directory
You then need to open the jupyter notebook. Enter this command from the terminal to open the jupyter notebook (see Figure 3-12):
jupyter notebook
A454310_1_En_3_Fig12_HTML.jpg
Figure 3-12.
Using the jupyter notebook
When you issue the command, the jupyter notebook engine side-loads essential components so that everything related to the jupyter notebook is loaded, as shown in Figure 3-13.
A454310_1_En_3_Fig13_HTML.jpg
Figure 3-13.
The essential components of jupyter notebooks
Once the jupyter notebook is loaded, you will see that the interface has an option for working with Python files. The type of distribution you have for Python is shown in the interface. Figure 3-14 shows that Python 3 is installed in this case.
A454310_1_En_3_Fig14_HTML.jpg
Figure 3-14.
Opening a new Python file
You can now start working with the Gym interface and start importing Gym libraries, as shown in Figure 3-15.
A454310_1_En_3_Fig15_HTML.jpg
Figure 3-15.
Working with Gym inside the jupyter notebook
The process continues until the program flow is completed. Figure 3-16 shows the process flow.
A454310_1_En_3_Fig16_HTML.jpg
Figure 3-16.
The flow of the program
After being reset, the environment shows an array, as shown in Figure 3-17.
A454310_1_En_3_Fig17_HTML.jpg
Figure 3-17.
An array is being created
Figure 3-18 shows the simulation. The cart-pole shifts by a margin that’s reflected by the array’s values.
A454310_1_En_3_Fig18_HTML.jpg
Figure 3-18.
The simulation in action

More Simulations

This section shows you how to try different simulations. There are many different environment types in OpenAI. One of them is the logarithmic type, discussed next.
There is variety of tasks involved in algorithms. Run this code to include the environment in the jupyter notebook (see Figure 3-19):
import gym
env = gym.make('Copy-v0')
env.reset()
env.render()
A454310_1_En_3_Fig19_HTML.jpg
Figure 3-19.
Including the environment in the jupyter notebook
The output looks like Figure 3-20. The prime motive for this simulation is to copy symbols from an input sequence.
A454310_1_En_3_Fig20_HTML.jpg
Figure 3-20.
The output after running the render function
This section uses an example of classic arcade games. First, open the required Anaconda environment using the following command:
source activate universe
Then go to the appropriate directory, say gym:
cd gym
From the terminal, start the jupyter notebook using this command:
jupyter notebook
This enables you to start working with the Python option. Figure 3-21 shows the process using the classic arcade games.
A454310_1_En_3_Fig21_HTML.jpg
Figure 3-21.
Using classic arcade games
After using env.reset(), an array is generated, as shown in Figure 3-22.
A454310_1_En_3_Fig22_HTML.jpg
Figure 3-22.
The array is being created
If you use env.render(), you’ll generate the output shown in Figure 3-23.
A454310_1_En_3_Fig23_HTML.jpg
Figure 3-23.
Rendering the output
This example is simply simulating different kinds of game environments and setting them up for Reinforcement Learning.
Here is the code to simulate the Space Invaders game:
import gym
env = gym.make('SpaceInvaders-v0')
env.reset()
env.render()
In the next section, you will learn how to work with OpenAI Universe .

OpenAI Universe

In this example, you will be using the jupyter notebook to simulate a game environment and then will apply Reinforcement Learning to it. Go to the universe directory and start the jupyter notebook .
import gym
import universe  # register the universe environments
env = gym.make('flashgames.DuskDrive-v0')
env.configure(remotes=1)  # automatically creates a local docker container
observation_n = env.reset()
while True:
  action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]  # your agent here
  observation_n, reward_n, done_n, info = env.step(action_n)
  env.render()
Figure 3-24 shows the code needed to set up the environment for the DuskDrive game .
A454310_1_En_3_Fig24_HTML.jpg
Figure 3-24.
Setting up the environment for the DuskDrive game
Now it will access the image and start the image remotely. It will run the game and start playing remotely with the help of an agent . See Figure 3-25.
A454310_1_En_3_Fig25_HTML.jpg
Figure 3-25.
The game played by the agent
First, you import the gym library, which is the base on which OpenAI Universe is built. You also must import universe, which registers all the Universe environments.
You import the gym library, as you will simulate on OpenAI Gym and Universe:
import gym
import universe # register the universe environments
After that, you create an environment for loading the Flash game that will be simulated (in this case, the DuskDrive game).
env = gym.make('flashgames.DuskDrive-v0')
env = gym.make('flashgames.DuskDrive-v0')
You call configure, which creates a dockerized environment for running the simulation locally.
env.configure(remotes=1)
You then call Env.reset () to instantiate the proper simulation environment asynchronously
observation_n = env.reset()
You then define the keyEvent and Arrowup actions to move the car in the simulated environment :
action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]
To get rewards and to check the status of the episodes, you use the following code and render accordingly.
observation_n, reward_n, done_n, info = env.step(action_n)
env.render()

Conclusion

This chapter explained the details of OpenAI. First, it described OpenAI in general and then described OpenAI Gym and OpenAI Universe.
We touched on installing OpenAI Gym and OpenAI Universe and then started coding for them using the Python language. Finally, we looked at some examples of both OpenAI Gym and OpenAI Universe.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset