The MDPtoolbox package also provides an implementation of the value iteration algorithm so that we can solve an MDP. The following code block demonstrates the same:
mdp_value_iteration(P=actions, R=rewards, discount = 0.2)
The following screenshot displays the optimal policy details:
The optimal policy that's given by the value iteration method is {2,3,1,1}. The value at the last step is close to what we got in the policy iteration method.