There's more...

In many RL problems, exploring the actions to formulate an optimal policy can be costly. Experience replay is a technique that's used to make the agents reuse past experiences. This technique enables fast convergence by replaying already observed state transitions as new observations in the environment. Experience replay requires sample sequences comprised of states, actions, and rewards as input data. These transitions make the agent learn a state-action function and an optimal policy for all the states in the input data. This policy can also be applied for validation purposes or to improve the current policy iteratively. To implement experience replay in R, you need to pass an existing RL model as an argument to the ReinforcementLearning() function.

Let's get 100 new data samples from the cliff walking environment:

new_observations = sequences(100,env) 
cols.name <- c("State","Action","NextState")
new_observations[cols.name] <- sapply(new_observations[cols.name],as.character)
sapply(new_observations, class)
head(new_observations)

The following screenshot shows a few records from the new observational data:

Now, we provide our existing RL model, which we created in the How to do it... section of this recipe, as an argument to update the existing policy.

The following screenshot shows the Q-value table for each state-action pair, after implementing experience replay:

In the preceding screenshot, we can see that the updated policy yielded a higher overall reward compared to the previous policy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset