Getting ready

The environment we are working on is as follows:

We start at the cell with S in it and our objective is to reach the cell where the reward is +1. In order to maximize the chances of achieving the reward, we will be using Bellman's equation, which calculates the value of each cell in the preceding grid as follows:

Value of current cell = reward of moving from the current cell to next cell + discount factor * value of next cell

Additionally, in the current problem, the reward for moving to any cell other than the cell with a reward of +1 is 0

The discount factor can be thought of as the energy expended moving from one cell to another. Thus, a cell that is far away from the rewarding cell will have a lower value compared to other cells in the current problem setting.

Once we calculate the value of each cell, we move to the cell that has the highest value of all the cells that an agent could move to.

The strategy that we'll adopt to calculate the value of each cell is as follows:

  • Initialize an empty board.
  • Define the possible actions that an agent could take in a cell.
  • Define the state that an agent will be in, for the action the agent takes in the current cell.
  • Calculate the value of the current state, which depends on the reward for moving to the next state, as well as the value of the next state.
  • Update the cell value of the current state based on the earlier calculation.
  • Additionally, store the action taken in the current state to move to the next state.
  • Note that, in the initial iterations, the values of cells that are far away from the end goal remain zero, while the values of cells that are adjacent to the end state rise.
  • As we iterate the previous steps multiple times, we will be in a position to update the cell values and, thus, in a position to decide the optimal route for the agent to follow.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset