Category 1 - value based 

Value function does look like the right-hand side of the image (the sum of discounted future rewards) where every state has some value. Let's say, the state one step away from the goal has a value of -1; and two steps away from the goal has a value of -2. In a similar way, the starting point has a value of -16. If the agent gets stuck in the wrong place, the value could be as much as -24. In fact, the agent does move across the grid based on the best possible values to reach its goal. For example, the agent is at a state with a value of -15. Here, it can choose to move either north or south, so it chooses to move north due to the high reward, which is -14 rather, than moving south, which has a value of -16. In this way, the agent chooses its path across the grid until it reaches the goal.

  • Value Function: Only values are defined at all states
  • No Policy (Implicit): No exclusive policy is present; policies are chosen based on the values at each state
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset