How to do it...

This section shows you how to set up model-based RL:

  1. Run the policy iteration using the state-action value function with the discount factor Υ = 0.9:
mdp_policy<- mdp_policy_iteration(P=TPMs, R=Rewards, discount=0.9) 
  1. Get the best (optimum) policy P* as shown in the following figure. The arrows marked in green show the direction of traversing S1 to S15:
mdp_policy$policy 
names(TPMs)[mdp_policy$policy] 
Optimum policy using model-based iteration with an optimum path from S1 to S15
  1. Get the optimum value function V* for each state and plot them as shown in the following figure:
mdp_policy$V 
names(mdp_policy$V) <- paste0("S",1:16) 
barplot(mdp_policy$V,col="blue",xlab="states",ylab="Optimal value",main="Value function of the optimal Policy",width=0.5) 
Value functions of the optimal policy
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset