How to do it...

In this section, we will code the strategy we discussed earlier (the code file is available as Frozen_Lake_with_Q_Learning.ipynb in GitHub):

Import the relevant packages:

import gym
from gym import envs
from gym.envs.registration import register

Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games such as Pong and Pinball.

More about Gym can be found at: https://gym.openai.com/.

register(
 id = 'FrozenLakeNotSlippery-v1',
 entry_point = 'gym.envs.toy_text:FrozenLakeEnv',
 kwargs = {'map_name': '4x4', 'is_slippery':False},
 max_episode_steps = 100,
 reward_threshold = 0.8196)

Create the environment:

env = gym.make('FrozenLakeNotSlippery-v1')

Inspect the created environment:

env.render()

The preceding step renders (prints) the environment:

env.observation_space

The preceding code provides the number of state action pairs in the environment. In our case, given that it is a 4 x 4 grid, we have a total of 16 states. Thus, we have a total of 16 observations.

env.action_space.n

The preceding code defines the number of actions that can be taken in a state in the environment:

env.action_space.sample()

The preceding code samples an action from the possible set of actions:

env.step(action)

The preceding code takes the action and generates the new state and the reward of the action, flags whether the game is done, and provides additional information for the step:

env.reset()

The preceding code resets the environment so that the agent is back to the starting state.

Initialize the q-table:

import numpy as np
qtable = np.zeros((16,4))

We have initialized it to a shape of (16, 4) as there are 16 states and 4 possible actions in each state.

Run multiple iterations of playing a game:

Initialize hyper-parameters:

total_episodes=15000
learning_rate=0.8
max_steps=99
gamma=0.95
epsilon=1
max_epsilon=1
min_epsilon=0.01
decay_rate=0.005

Play multiple episodes of the game:

rewards=[]
for episode in range(total_episodes):
    state=env.reset()
    step=0
    done=False
    total_rewards=0

In the code below, we are defining the action to be taken. If eps (which is a random number generated between 0 to 1) is less than 0.5, we explore; otherwise, we exploit (to consider the best action in a q-table)

    for step in range(max_steps):
        exp_exp_tradeoff=random.uniform(0,1)        
        ## Exploitation:
        if exp_exp_tradeoff>epsilon:
            action=np.argmax(qtable[state,:])
        else:
            ## Exploration
            action=env.action_space.sample()

In the code below, we are fetching the new state and the reward, and flag whether the game is done by taking the action in the given step:

        new_state, reward, done, _ = env.step(action)

In the code below, we are updating the q-table based on the action taken in a state. Additionally, we are also updating the state with the new state obtained after taking action in the current state:

        qtable[state,action]=qtable[state,action]+learning_rate*(reward+gamma*np.max(qtable[new_state,:])-qtable[state,action])
        total_rewards+=reward
        state=new_state

In the following code, as the game is over, we proceed to a new episode of the game. However, we ensure that the randomness factor (eps), which is used in deciding whether we are going for exploration or exploitation, is updated.

        if(done):
             break
        epsilon=min_epsilon+(max_epsilon-min_epsilon)*np.exp(decay_rate*episode)
        rewards.append(total_rewards)

Once we have built the q-table, we now deploy the agent to maneuver in line with the optimal actions suggested by the q-table:

env.reset()

for episode in range(1):
    state=env.reset()
    step=0
    done=False
    print("-----------------------")
    print("Episode",episode)
    for step in range(max_steps):
        env.render()
        action=np.argmax(qtable[state,:])
        print(action)
        new_state,reward,done,info=env.step(action)
        
        if done:
            #env.render()
            print("Number of Steps",step+1)
            break
        state=new_state

The preceding gives the optimal path that the agent has to traverse to reach the end goal.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...