Monte Carlo prediction

As we know, Monte Carlo methods predict the state-value function for a given policy. The value of any state is the expected return or expected cumulative future discounted rewards starting from that state. These values are estimated in MC methods simply to average the returns observed after visits to that state. As more and more values are observed, the average should converge to the expected value based on the law of large numbers. In fact, this is the principle applicable in all Monte Carlo methods. The Monte Carlo Policy Evaluation Algorithm consist of the following steps:

  1. Initialize:
  1. Repeat forever:
    • Generate an episode using π
    • For each state s appearing in the episode:
      • G return following the first occurrence of s
      • Append G to Returns(s)
      • V(s)  average(Returns(s))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset