How to do it...

Let's programmatically create an environment for our problem:

We start by defining the states and set of actions:

states <- c("1", "2", "3", "4")
actions <- c("up", "down", "left", "right")
cat("The states are:",states)
cat('
')
cat("The actions are:",actions)

Next, we create a function that will define a custom environment for our example:

gridExampleEnvironment <- function(state, action) {
 next_state <- state
 if (state == state("1") && action == "down") next_state <- state("4")
 if (state == state("1") && action == "right") next_state <- state("2")
 if (state == state("2") && action == "left") next_state <- state("1")
 if (state == state("2") && action == "right") next_state <- state("3")
 if (state == state("3") && action == "left") next_state <- state("2")
 if (state == state("3") && action == "up") next_state <- state("4")
 if (next_state == state("4") && state != state("4")) {
 reward <- 100
 } else {
 reward <- -1
 }
out <- list("NextState" = next_state, "Reward" = reward)
return(out)
}

print(gridExampleEnvironment)

The following screenshot provides a description of the environment:

Now, we generate some sample experience data in the form of state transition tuples using the sampleExperience() function. This function takes states, actions, iterations, and the environment as input arguments:

# Let us generate 1000 iterations
sequences <- sampleExperience(N = 1000, env = gridExampleEnvironment, states = states, actions = actions)
head(sequences,6)

The following screenshot shows the first few records of the sample data:

Using the sample experience data we generated in the previous step, we can solve our problem using the ReinforcementLearning() function:

solver_rl <- ReinforcementLearning(sequences, s = "State", a = "Action", r = "Reward", s_new = "NextState")

print(solver_rl)

The following screenshot shows the state action table, along with the policy and overall reward for our problem. X1,X2,X3,X4 represents the states 1,2,3,4 respectively:

Here, we can see that our overall reward at the last iteration is 11423.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...