Q-network

Actually, Deepmind started making a name for themselves before Go. They were using what is called a Q-network to solve Atari games. These are a set of simple games where the gamer can play only up to 10 moves at each stage.

With these networks, the goal is to estimate a long-term reward function (like the number of points) and which move will maximize it. By feeding in enough options at the beginning, the network will progressively learn how to play better and better. The reward function is the following:

Q(s,a) = r + γ(max(Q(s',a'))

r is the reward, γ is a discounting factor (future gains are not as important as the immediate reward), s is the current state of the game, and a is the action we could take.

Of course, as it is continuously learning, it is also continuously forgetting, and the network will have to be fed with past training as well. To use a metaphor, it will end up running without being able to walk, which is quite useless.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset