Strategies for solving MABP

Based on how the exploration is done, the strategies to solve the MABP can be classified into the following types:

  • No exploration
  • Exploration at random
  • Exploration smartly with preference to uncertainty

Let's delve into the details of some of the algorithms that fall under each of the strategy types.

Let's consider one very naive approach that involves playing just one slot machine for a long time. Here, we do no exploration at all and just randomly pick one arm to repeatedly pull to maximize the long-term rewards. You must be wondering how this works! Let's explore.

In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times. According to this law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

We can just play with one machine for a large number of rounds so as to eventually estimate the true reward probability according to the law of large numbers.

However, there are some problems with this strategy. First and foremost, we do not know the value of a large number of rounds. Second, it is super resource intensive to play the same slot repeatedly for large number of times. And, most importantly, there is no guarantee that we will obtain the best long-term reward with this strategy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset