© Abhishek Nandy and Manisha Biswas  2018
Abhishek Nandy and Manisha BiswasReinforcement Learning https://doi.org/10.1007/978-1-4842-3285-9_6

6. Google’s DeepMind and the Future of Reinforcement Learning

Abhishek Nandy and Manisha Biswas2
(1)
Rm HIG L-2/4, Bldg Swaranika Co-Opt HSG, Kolkata, West Bengal, India
(2)
North 24 Parganas, West Bengal, India
 
This chapter discusses Google DeepMind and Google AlphaGo and then moves on to the future of Reinforcement Learning and compares what’s happening with man versus machine.

Google DeepMind

Google DeepMind (see Figure 6-1) was formed to take AI to the next level. The aim and motive of Google in this case is to research and develop programs that can solve complex problems without needing to teach it the steps for doing so.
A454310_1_En_6_Fig1_HTML.jpg
Figure 6-1.
Google DeepMind logo
The link to visit the DeepMind web site is https://deepmind.com/ .
This web site (see Figure 6-2) contains all the details and the future work they are doing. There are publications and research options available on the site.
A454310_1_En_6_Fig2_HTML.jpg
Figure 6-2.
The DeepMind web site
You will see that the web site has lots of topics to search and discover.

Google AlphaGo

This section takes a look at AlphaGo (see Figure 6-3), which is one of the best solutions from the Google DeepMind team.
A454310_1_En_6_Fig3_HTML.jpg
Figure 6-3.
The Google AlphaGo logo

What Is AlphaGo?

AlphaGo is the Google program that plays the game Go, which is a traditional abstract strategy board game for two players. The object of the game is to occupy more territory than your opponent. Figure 6-4 shows the Go game board.
Despite its simple rules, Go has more possible solutions than the number of atoms in the visible world!
A454310_1_En_6_Fig4_HTML.jpg
Figure 6-4.
The Go board (Image courtesy of Jaro Larnos, https://www.flickr.com/photos/jlarnos/ , used under a CC-BY 2.0 license)
The concept of the Go game and its underlying mathematical terms included are illustrated in Figure 6-5.
A454310_1_En_6_Fig5_HTML.jpg
Figure 6-5.
Concept of the Go game
AlphaGo is the first computer program to defeat a professional human Go player, the first program to defeat a Go world champion, and arguably the best Go player in history.
Figure 6-6 illustrates the AlphaGo approach.
A454310_1_En_6_Fig6_HTML.jpg
Figure 6-6.
Deep Q approach

Monte Carlo Search

Monte Carlo Search (MCS ) is based on the AI tree traversal approach. It uses a unique set of behaviors for moving through the tree.
MCS first selects each state it can go through, as mentioned in the declared policy. After a certain depth, the policy does not allow the state to go through. MCS then expands from that state to the possible actions that can be taken randomly. This way, you are using MCS-based simulation to all possible states to get rewards. We you do a random simulation path, you also get Q state values for random paths if you change from one state to another. From the Q state received, you can back up information and move to the top. The entire process is shown in Figure 6-7.
A454310_1_En_6_Fig7_HTML.jpg
Figure 6-7.
The Monte Carlo Search tree process
AlphaGo relies on two components: A tree search procedure and convolutional networks that guide the tree search procedure.
In total, three convolutional networks of two different kinds are trained: two policy networks and one value network.

Man vs. Machines

With the advent of Reinforcement Learning, there are many more jobs being automated and many low-level jobs are being done by machines.
Now the focus is on how Reinforcement Learning can solve different problems and change the well being of the earth.
For example, Reinforcement Learning can be used in the healthcare field. Instead of using the same age-old tools for body scans, we can train robots and medical equipment to scan body parts for different diagnoses purposes much quicker and with greater accuracy. With repeated training, decisions to perform more complex measurements and scans can be left to the machines too.

Positive Aspects of AI

Cognitive modeling is applied when we gather information and resources and through which the system learns. This is called the cognitive way. Technological singularity is achieved by enhancement of cognitive modeling devices that interact and achieve more unified goals.
A good strong AI solution is selfless and places the interest of others above all else. A good AI solution always works for the team. By adding human empathy, as seen with brainwaves, we can create good AI solutions that appear to be compassionate.
Applying a topological view to the world of AI helps streamline activities and allows each topology to master a specific, unique task.

Negative Aspects of AI

There can be negative aspects too. For example, what if a machine learns so fast that it starts talking to other machines and creates an AI of its own? In that case, it would be difficult for humans to predict the end game. We need to take these scenarios into consideration. Perhaps every AI solution needs a secret killswitch , as illustrated in Figure 6-8.
A454310_1_En_6_Fig8_HTML.jpg
Figure 6-8.
Insert a killswitch just in case
Here are the steps to this basic process:
  1. 1.
    We start a program.
     
  2. 2.
    We apply Machine Learning to it.
     
  3. 3.
    The program learns very quickly.
     
  4. 4.
    We have to incorporate a killswitch into the process so that we can allow the program to be rolled back if necessary.
     
  5. 5.
    When we see an anomaly or any abrupt behavior, we call the killswitch to roll the program back to the start.
     
There is a good chance that machines may learn this way, especially if they work in tandem. At some transition point, they might start interacting in a way that creates an AI of their own. We have to be able to avoid collisions of two or more Reinforcement Learning programs during the transition phase.

Conclusion

We touched on a lot of concepts in this book, especially related to Reinforcement Learning. The book is an overview of how Reinforcement Learning works and the ideas you need to understand to get started.
  • We simplified the RL concepts with the help of the Python programming language.
  • We introduced OpenAI Gym and OpenAI Universe.
  • We introduced a lot of algorithms and touched on Keras and TensorFlow.
We hope you have liked the book. Thanks again!
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset