How it works...

In step 1, we created the cliff walking environment using the makeEnvironment() function from the reinforcelearn library. This environment belongs to the gridworld class. In step 2, we created a customized function to query the cliff walking environment and get the sample observational data. The step() method of the env() function takes an action as the input argument and returns a list with the state, reward, and done as the output. Once the observation sequence data was generated, we used the ReinforcementLearning() function to make the agent learn an optimal policy based on this data in the last step.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset