List of Listings

Chapter 3. Implementing your first Go bot

Listing 3.1. Using an enum to represent players

Listing 3.2. Using tuples to represent points of a Go board

Listing 3.3. Setting moves: plays, passes, or resigns

Listing 3.4. Encoding strings of stones with set

Listing 3.5. Creating a Go Board instance

Listing 3.6. Checking neighboring points for liberties

Listing 3.7. Utility methods for placing and removing stones

Listing 3.8. Continuing our definition of place_stone

Listing 3.9. Continuing our definition of place_stone

Listing 3.10. Encoding game state for a game of Go

Listing 3.11. Deciding when a game of Go is over

Listing 3.12. Continuing our definition of GameState to enforce the self-capture rule

Listing 3.13. Does the current game state violate the ko rule?

Listing 3.14. Is this move valid for the given game state?

Listing 3.15. Is the given point on the board an eye?

Listing 3.16. Your central interface for Go agents

Listing 3.17. A random Go bot, playing at about 30 kyu strength

Listing 3.18. Utility functions for bot vs. bot games

Listing 3.19. A script to let a bot play against itself

Listing 3.20. Generating Zobrist hashes

Listing 3.21. GoString instances with immutable sets of stones and liberties

Listing 3.22. Instantiating the Go board with a _hash value for the empty board

Listing 3.23. Placing a stone means applying the hash of that stone

Listing 3.24. Removing a stone means unapplying the hash value of the stone

Listing 3.25. Returning the current Zobrist hash of the board

Listing 3.26. Initializing game state with Zobrist hashes

Listing 3.27. Fast checking of game states for ko with Zobrist hashes

Listing 3.28. Transforming human input into coordinates for your Go board

Listing 3.29. Setting up a script so you can play your own bot

Chapter 4. Playing games with tree search

Listing 4.1. A function that finds a move that immediately wins the game

Listing 4.2. A function that avoids giving the opponent a winning move

Listing 4.3. A function that finds a two-move sequence that guarantees a win

Listing 4.4. An enum to represent the outcome of a game

Listing 4.5. A game-playing agent that implements minimax search

Listing 4.6. First step of the minimax search algorithm

Listing 4.7. Implementing minimax search

Listing 4.8. A highly simplified board evaluation heuristic for Go

Listing 4.9. Depth-pruned minimax search

Listing 4.10. Checking whether you can stop evaluating a branch

Listing 4.11. Full implementation of alpha-beta pruning

Listing 4.12. A data structure to represent an MCTS tree

Listing 4.13. Methods to update a node in an MCTS tree

Listing 4.14. Helper methods to access useful MCTS tree properties

Listing 4.15. The MCTS algorithm

Listing 4.16. Selecting a move after completing your MCTS rollouts

Listing 4.17. Selecting a branch to explore with the UCT formula

Chapter 5. Getting started with neural networks

Listing 5.1. One-hot encoding of MNIST labels

Listing 5.2. Reshaping MNIST data and loading training and test data

Listing 5.3. Computing the average value for images representing the same digit

Listing 5.4. Computing and displaying the average 8 in your training set

Listing 5.5. Computing how close a digit is to your weights by using the dot product

Listing 5.6. Simple implementation of sigmoid function for double values and vectors

Listing 5.7. Computing predictions from weights and bias with dot product and sigmoid

Listing 5.8. Evaluating predictions of your model with a decision threshold

Listing 5.9. Calculating prediction accuracy for three data sets

Listing 5.10. Mean squared error loss function and its derivative

Listing 5.11. Base layer implementation

Listing 5.12. Connecting layers through successors and predecessors

Listing 5.13. Forward and backward passes in a layer of a sequential neural network

Listing 5.14. Implementation of the derivative of the sigmoid function

Listing 5.15. Sigmoid activation layer

Listing 5.16. Dense layer weight initialization

Listing 5.17. Dense layer forward pass

Listing 5.18. Dense layer backward pass

Listing 5.19. Dense layer weight update mechanism

Listing 5.20. Sequential neural network initialization

Listing 5.21. Adding layers sequentially

Listing 5.22. Train method on a sequential network

Listing 5.23. Training a sequential neural network on a batch of data

Listing 5.24. Updating rule and feed-forward and backward passes for your network

Listing 5.25. Evaluation

Listing 5.26. Instantiating a neural network

Listing 5.27. Running a neural network instance on training data

Chapter 6. Designing a neural network for Go data

Listing 6.1. Abstract Encoder class to encode Go game state

Listing 6.2. Referencing Go board encoders by name

Listing 6.3. Encoding game state with a simple one-plane Go board encoder

Listing 6.4. Encoding and decoding points with your one-plane Go board encoder

Listing 6.5. Imports for generating encoded Monte Carlo tree-search game data

Listing 6.6. Generating MCTS games for this chapter

Listing 6.7. Main application for generating MCTS games for this chapter

Listing 6.8. Importing models, layers, and data sets from Keras

Listing 6.9. Loading and preprocessing MNIST data with Keras

Listing 6.10. Building a simple sequential model with Keras

Listing 6.11. Compiling a Keras deep-learning model

Listing 6.12. Training and evaluating a Keras model

Listing 6.13. Loading and preprocessing previously stored Go game data

Listing 6.14. Running a Keras multilayer perceptron on generated Go data

Listing 6.15. Evaluating the model on a known board position

Listing 6.16. Loading and preprocessing Go data for convolutional neural networks

Listing 6.17. Building a simple convolutional neural network for Go data with Keras

Listing 6.18. Adding a max pooling layer of pool size (2, 2) to a Keras model

Listing 6.19. Defining the softmax activation function in Python

Listing 6.20. Adding a max pooling layer of pool size (2, 2) to a Keras model

Listing 6.21. Compiling a Keras model with categorical cross-entropy

Listing 6.22. Importing and adding a Dropout layer to a Keras model

Listing 6.23. Adding a rectified linear activation to a Dense layer

Listing 6.24. Loading and preprocessing Go data for convolutional neural networks

Listing 6.25. Building a convolutional network for Go data with dropout and ReLUs

Listing 6.26. Evaluating your enhanced convolutional network

Chapter 7. Learning from data: a deep-learning bot

Listing 7.1. Creating an index of zip files containing Go data from KGS

Listing 7.2. Replaying moves from an SGF file with your Go framework

Listing 7.3. Python libraries needed for data and file processing

Listing 7.4. Imports for data processing from the dlgo module.

Listing 7.5. Initializing a Go data processor with an encoder and a local data directory

Listing 7.6. load_go_data loads, processes, and stores data

Listing 7.7. Processing Go records stored in zip files into encoded features and labels

Listing 7.8. Calculating the total number of moves available in the current zip file

Listing 7.9. Retrieving handicap stones and applying them to an empty Go board

Listing 7.10. Persisting features and labels locally in small chunks

Listing 7.11. Consolidating individual NumPy arrays of features and labels into one set

Listing 7.12. Loading training data from 100 game records

Listing 7.13. The signature of a Go data generator

Listing 7.14. Private method to generate and yield the next batch of Go data

Listing 7.15. Calling the generate method to obtain a generator for model training

Listing 7.16. The parallel version of load_go_data can optionally return a generator

Listing 7.17. Loading training data from 100 game records

Listing 7.18. Specifying layers for a small convolutional network for Go move prediction

Listing 7.19. Core imports for building a neural network for Go data

Listing 7.20. Creating training and test generators

Listing 7.21. Defining a Keras model from your small layer architecture

Listing 7.22. Fitting and evaluating Keras models with generators

Listing 7.23. Initializing a simple seven-plane encoder

Listing 7.24. Encoding game state with a SevenPlaneEncoder

Listing 7.25. Implementing all other Encoder methods for your seven-plane encoder

Listing 7.26. Initializing SGD in Keras with momentum and learning rate decay

Listing 7.27. Using the Adagrad optimizer for Keras models

Listing 7.28. Using the Adadelta optimizer for Keras models

Chapter 8. Deploying bots in the wild

Listing 8.1. Initializing an agent with a Keras model and a Go board encoder

Listing 8.2. Encoding board state and predicting move probabilities with a model

Listing 8.3. Scaling, clipping, and renormalizing your move probability distribution

Listing 8.4. Trying to apply moves from a ranked candidate list

Listing 8.5. Serializing a deep-learning agent

Listing 8.6. Deserializing a DeepLearningAgent from an HDF5 file

Listing 8.7. Registering a random agent and starting a web application with it

Listing 8.8. Loading features and labels from Go data with a processor

Listing 8.9. Building and running a large Go move-predicting model with Adadelta

Listing 8.10. Creating and persisting a DeepLearningAgent

Listing 8.11. Loading a bot back into memory and serving it in a web application

Listing 8.12. Python implementation of a GTP command

Listing 8.13. Parsing a GTP Command from plain text

Listing 8.14. Converting between GTP coordinates and your internal Point type

Listing 8.15. A termination strategy tells your bot when to end a game

Listing 8.16. Passing whenever an opponent passes

Listing 8.17. Wrapping an agent with a termination strategy

Listing 8.18. Imports for your local bot runner

Listing 8.19. Initializing a runner to clash two bot opponents

Listing 8.20. Sending a GTP command and receiving a response

Listing 8.21. Set up the board, let the opponents play the game, and persist it

Listing 8.22. A game ends when an opponent signals to stop it

Listing 8.23. Asking your bot to generate and play a move that’s translated into GTP

Listing 8.24. Your opponent plays moves by responding to genmove

Listing 8.25. Letting one of your bots loose on Pachi

Listing 8.26. Encoding and serializing a GTP response

Listing 8.27. Python imports for your GTP frontend

Listing 8.28. Initializing a GTPFrontend, which defines GTP event handlers

Listing 8.29. The frontend parses from the input stream until the game ends

Listing 8.30. A few of the most important event responses for your GTP frontend

Listing 8.31. Starting your GTP interface from the command line

Chapter 9. Learning by practice: reinforcement learning

Listing 9.1. Calculating return on an action

Listing 9.2. Calculating discounted returns

Listing 9.3. An example of sampling from a probability distribution

Listing 9.4. Sampling from a probability distribution with NumPy

Listing 9.5. Repeatedly sampling from a probability distribution with NumPy

Listing 9.6. Clipping a probability distribution

Listing 9.7. The constructor for the PolicyAgent class

Listing 9.8. Constructing a new learning agent

Listing 9.9. Serializing a PolicyAgent to disk

Listing 9.10. An example of using the serialize function

Listing 9.11. Loading a policy agent from a file

Listing 9.12. Selecting a move with a neural network

Listing 9.13. Constructor for an experience buffer

Listing 9.14. Saving an experience buffer to disk

Listing 9.15. Restoring an ExperienceBuffer from an HDF5 file

Listing 9.16. An object to track decisions within a single episode

Listing 9.17. Integrating an ExperienceCollector with a PolicyAgent

Listing 9.18. Simulating a game between two agents

Listing 9.19. Initialization for generating a batch of experience

Listing 9.20. Playing a batch of games

Listing 9.21. Saving a batch of experience data

Chapter 10. Reinforcement learning with policy gradients

Listing 10.1. Randomly drawing a number from 1 to 5

Listing 10.2. Simulating a game of Add It Up

Listing 10.3. Sample outputs of listing 10.2

Listing 10.4. A policy learning implementation for the simple game Add It Up

Listing 10.5. Encoding experience data as a target vector

Listing 10.6. Training an agent from experience data with policy gradient learning

Listing 10.7. Training on previously saved experience data

Listing 10.8. Script for comparing the strength of two agents

Chapter 11. Reinforcement learning with value methods

Listing 11.1. Pseudocode for an ϵ-greedy policy

Listing 11.2. Defining a model with the Keras sequential API

Listing 11.3. Defining an identical model with the Keras functional API

Listing 11.4. A two-input action-value network

Listing 11.5. Constructor and utility methods for a Q-learning agent

Listing 11.6. Selecting moves for a Q-learning agent

Listing 11.7. Selecting moves for a Q-learning agent

Listing 11.8. Training the Q-learning agent from its experience

Chapter 12. Reinforcement learning with actor-critic methods

Listing 12.1. Updating ExperienceCollector to track advantages

Listing 12.2. Updating ExperienceCollector to store estimated values

Listing 12.3. Calculating advantage at the end of an episode

Listing 12.4. Adding advantage to the ExperienceBuffer structure

Listing 12.5. A two-output network with a policy output and a value output

Listing 12.6. Selecting a move for an actor-critic agent

Listing 12.7. Selecting a move for an actor-critic agent

Chapter 13. AlphaGo: Bringing it all together

Listing 13.1. Initializing a neural network for both policy and value networks in AlphaGo

Listing 13.2. Creating AlphaGo’s strong policy network in Keras

Listing 13.3. Building AlphaGo’s value network in Keras

Listing 13.4. Signature and initialization of your AlphaGo board encoder

Listing 13.5. Loading data for the first step of training AlphaGo’s policy network

Listing 13.6. Creating an AlphaGo policy network with Keras

Listing 13.7. Training and persisting a policy network

Listing 13.8. Loading the trained policy network twice to create two self-play opponents

Listing 13.9. Generating self-play data for your PolicyAgent to learn from

Listing 13.10. Initializing an AlphaGo value network

Listing 13.11. Training a value network from experience data

Listing 13.12. Doing rollouts with the fast policy network

Listing 13.13. A simple view on a node in an AlphaGo tree

Listing 13.14. Defining an AlphaGo tree node in Python

Listing 13.15. Selecting an AlphaGo child by maximizing Q-value

Listing 13.16. Updating visit counts, Q-value, and utility of an AlphaGo node

Listing 13.17. Initializing an AlphaGoMCTS Go playing agent

Listing 13.18. The main method in AlphaGo’s tree-search process

Listing 13.19. Selecting the most visited node and updating the tree’s root node

Listing 13.20. Computing normalized strong policy values for legal moves on the board

Listing 13.21. Playing until the rollout_limit is reached

Listing 13.22. Initializing an AlphaGo agent with three deep neural networks

Chapter 14. AlphaGo Zero: Integrating tree search with reinforcement learning

Listing 14.1. Modifying the board encoder to include passing

Listing 14.2. A structure to track branch statistics

Listing 14.3. A node in an AGZ-style search tree

Listing 14.4. Helpers to read branch information from a tree node

Listing 14.5. Choosing a child branch

Listing 14.6. Walking down the search tree

Listing 14.7. Creating a new node in the search tree

Listing 14.8. Expanding the search tree and updating all node statistics

Listing 14.9. Selecting the move with the highest visit count

Listing 14.10. Simulating a self-play game

Listing 14.11. A specialized experience collector for AGZ-style learning

Listing 14.12. Passing along the decision to the experience collector

Listing 14.13. Training the combined network

Listing 14.14. A single cycle of the reinforcement-learning process

Listing 14.15. Using np.random.dirichlet to sample from a Dirichlet distribution

Listing 14.16. Drawing from a Dirichlet distribution when α is close to zero

Listing 14.17. Adding batch normalization to a Keras network

Appendix E. Submitting a bot to the Online Go Server

Listing E.1. run_gtp_aws.py to run a bot on AWS that connects against OGS

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset