Category 2 - policy based 

The arrows in the following image represent what an agent chooses as the direction of the next move while in any of these states. For example, the agent first moves east and then north, following all the arrows until the goal has been reached. This is also known as mapping from states to actions. Once we have this mapping, an agent just needs to read it and behave accordingly.

  • Policy: Policies or arrows that get adjusted to reach the maximum possible future rewards. As the name suggests, only policies are stored and optimized to maximize rewards.
  • No value function: No values exist for the states.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset