This section shows you how to set up model-based RL:
- Run the policy iteration using the state-action value function with the discount factor Υ = 0.9:
mdp_policy<- mdp_policy_iteration(P=TPMs, R=Rewards, discount=0.9)
- Get the best (optimum) policy P* as shown in the following figure. The arrows marked in green show the direction of traversing S1 to S15:
mdp_policy$policy names(TPMs)[mdp_policy$policy]
Optimum policy using model-based iteration with an optimum path from S1 to S15
- Get the optimum value function V* for each state and plot them as shown in the following figure:
mdp_policy$V names(mdp_policy$V) <- paste0("S",1:16) barplot(mdp_policy$V,col="blue",xlab="states",ylab="Optimal value",main="Value function of the optimal Policy",width=0.5)
Value functions of the optimal policy