Chapter 3. Implementing your first Go bot
Listing 3.1. Using an enum to represent players
Listing 3.2. Using tuples to represent points of a Go board
Listing 3.3. Setting moves: plays, passes, or resigns
Listing 3.4. Encoding strings of stones with set
Listing 3.5. Creating a Go Board instance
Listing 3.6. Checking neighboring points for liberties
Listing 3.7. Utility methods for placing and removing stones
Listing 3.8. Continuing our definition of place_stone
Listing 3.9. Continuing our definition of place_stone
Listing 3.10. Encoding game state for a game of Go
Listing 3.11. Deciding when a game of Go is over
Listing 3.12. Continuing our definition of GameState to enforce the self-capture rule
Listing 3.13. Does the current game state violate the ko rule?
Listing 3.14. Is this move valid for the given game state?
Listing 3.15. Is the given point on the board an eye?
Listing 3.16. Your central interface for Go agents
Listing 3.17. A random Go bot, playing at about 30 kyu strength
Listing 3.18. Utility functions for bot vs. bot games
Listing 3.19. A script to let a bot play against itself
Listing 3.20. Generating Zobrist hashes
Listing 3.21. GoString instances with immutable sets of stones and liberties
Listing 3.22. Instantiating the Go board with a _hash value for the empty board
Listing 3.23. Placing a stone means applying the hash of that stone
Listing 3.24. Removing a stone means unapplying the hash value of the stone
Listing 3.25. Returning the current Zobrist hash of the board
Listing 3.26. Initializing game state with Zobrist hashes
Listing 3.27. Fast checking of game states for ko with Zobrist hashes
Listing 3.28. Transforming human input into coordinates for your Go board
Listing 3.29. Setting up a script so you can play your own bot
Chapter 4. Playing games with tree search
Listing 4.1. A function that finds a move that immediately wins the game
Listing 4.2. A function that avoids giving the opponent a winning move
Listing 4.3. A function that finds a two-move sequence that guarantees a win
Listing 4.4. An enum to represent the outcome of a game
Listing 4.5. A game-playing agent that implements minimax search
Listing 4.6. First step of the minimax search algorithm
Listing 4.7. Implementing minimax search
Listing 4.8. A highly simplified board evaluation heuristic for Go
Listing 4.9. Depth-pruned minimax search
Listing 4.10. Checking whether you can stop evaluating a branch
Listing 4.11. Full implementation of alpha-beta pruning
Listing 4.12. A data structure to represent an MCTS tree
Listing 4.13. Methods to update a node in an MCTS tree
Listing 4.14. Helper methods to access useful MCTS tree properties
Listing 4.15. The MCTS algorithm
Listing 4.16. Selecting a move after completing your MCTS rollouts
Listing 4.17. Selecting a branch to explore with the UCT formula
Chapter 5. Getting started with neural networks
Listing 5.1. One-hot encoding of MNIST labels
Listing 5.2. Reshaping MNIST data and loading training and test data
Listing 5.3. Computing the average value for images representing the same digit
Listing 5.4. Computing and displaying the average 8 in your training set
Listing 5.5. Computing how close a digit is to your weights by using the dot product
Listing 5.6. Simple implementation of sigmoid function for double values and vectors
Listing 5.7. Computing predictions from weights and bias with dot product and sigmoid
Listing 5.8. Evaluating predictions of your model with a decision threshold
Listing 5.9. Calculating prediction accuracy for three data sets
Listing 5.10. Mean squared error loss function and its derivative
Listing 5.11. Base layer implementation
Listing 5.12. Connecting layers through successors and predecessors
Listing 5.13. Forward and backward passes in a layer of a sequential neural network
Listing 5.14. Implementation of the derivative of the sigmoid function
Listing 5.15. Sigmoid activation layer
Listing 5.16. Dense layer weight initialization
Listing 5.17. Dense layer forward pass
Listing 5.18. Dense layer backward pass
Listing 5.19. Dense layer weight update mechanism
Listing 5.20. Sequential neural network initialization
Listing 5.21. Adding layers sequentially
Listing 5.22. Train method on a sequential network
Listing 5.23. Training a sequential neural network on a batch of data
Listing 5.24. Updating rule and feed-forward and backward passes for your network
Listing 5.26. Instantiating a neural network
Listing 5.27. Running a neural network instance on training data
Chapter 6. Designing a neural network for Go data
Listing 6.1. Abstract Encoder class to encode Go game state
Listing 6.2. Referencing Go board encoders by name
Listing 6.3. Encoding game state with a simple one-plane Go board encoder
Listing 6.4. Encoding and decoding points with your one-plane Go board encoder
Listing 6.5. Imports for generating encoded Monte Carlo tree-search game data
Listing 6.6. Generating MCTS games for this chapter
Listing 6.7. Main application for generating MCTS games for this chapter
Listing 6.8. Importing models, layers, and data sets from Keras
Listing 6.9. Loading and preprocessing MNIST data with Keras
Listing 6.10. Building a simple sequential model with Keras
Listing 6.11. Compiling a Keras deep-learning model
Listing 6.12. Training and evaluating a Keras model
Listing 6.13. Loading and preprocessing previously stored Go game data
Listing 6.14. Running a Keras multilayer perceptron on generated Go data
Listing 6.15. Evaluating the model on a known board position
Listing 6.16. Loading and preprocessing Go data for convolutional neural networks
Listing 6.17. Building a simple convolutional neural network for Go data with Keras
Listing 6.18. Adding a max pooling layer of pool size (2, 2) to a Keras model
Listing 6.19. Defining the softmax activation function in Python
Listing 6.20. Adding a max pooling layer of pool size (2, 2) to a Keras model
Listing 6.21. Compiling a Keras model with categorical cross-entropy
Listing 6.22. Importing and adding a Dropout layer to a Keras model
Listing 6.23. Adding a rectified linear activation to a Dense layer
Listing 6.24. Loading and preprocessing Go data for convolutional neural networks
Listing 6.25. Building a convolutional network for Go data with dropout and ReLUs
Listing 6.26. Evaluating your enhanced convolutional network
Chapter 7. Learning from data: a deep-learning bot
Listing 7.1. Creating an index of zip files containing Go data from KGS
Listing 7.2. Replaying moves from an SGF file with your Go framework
Listing 7.3. Python libraries needed for data and file processing
Listing 7.4. Imports for data processing from the dlgo module.
Listing 7.5. Initializing a Go data processor with an encoder and a local data directory
Listing 7.6. load_go_data loads, processes, and stores data
Listing 7.7. Processing Go records stored in zip files into encoded features and labels
Listing 7.8. Calculating the total number of moves available in the current zip file
Listing 7.9. Retrieving handicap stones and applying them to an empty Go board
Listing 7.10. Persisting features and labels locally in small chunks
Listing 7.11. Consolidating individual NumPy arrays of features and labels into one set
Listing 7.12. Loading training data from 100 game records
Listing 7.13. The signature of a Go data generator
Listing 7.14. Private method to generate and yield the next batch of Go data
Listing 7.15. Calling the generate method to obtain a generator for model training
Listing 7.16. The parallel version of load_go_data can optionally return a generator
Listing 7.17. Loading training data from 100 game records
Listing 7.18. Specifying layers for a small convolutional network for Go move prediction
Listing 7.19. Core imports for building a neural network for Go data
Listing 7.20. Creating training and test generators
Listing 7.21. Defining a Keras model from your small layer architecture
Listing 7.22. Fitting and evaluating Keras models with generators
Listing 7.23. Initializing a simple seven-plane encoder
Listing 7.24. Encoding game state with a SevenPlaneEncoder
Listing 7.25. Implementing all other Encoder methods for your seven-plane encoder
Listing 7.26. Initializing SGD in Keras with momentum and learning rate decay
Chapter 8. Deploying bots in the wild
Listing 8.1. Initializing an agent with a Keras model and a Go board encoder
Listing 8.2. Encoding board state and predicting move probabilities with a model
Listing 8.3. Scaling, clipping, and renormalizing your move probability distribution
Listing 8.4. Trying to apply moves from a ranked candidate list
Listing 8.5. Serializing a deep-learning agent
Listing 8.6. Deserializing a DeepLearningAgent from an HDF5 file
Listing 8.7. Registering a random agent and starting a web application with it
Listing 8.8. Loading features and labels from Go data with a processor
Listing 8.9. Building and running a large Go move-predicting model with Adadelta
Listing 8.10. Creating and persisting a DeepLearningAgent
Listing 8.11. Loading a bot back into memory and serving it in a web application
Listing 8.12. Python implementation of a GTP command
Listing 8.13. Parsing a GTP Command from plain text
Listing 8.14. Converting between GTP coordinates and your internal Point type
Listing 8.15. A termination strategy tells your bot when to end a game
Listing 8.16. Passing whenever an opponent passes
Listing 8.17. Wrapping an agent with a termination strategy
Listing 8.18. Imports for your local bot runner
Listing 8.19. Initializing a runner to clash two bot opponents
Listing 8.20. Sending a GTP command and receiving a response
Listing 8.21. Set up the board, let the opponents play the game, and persist it
Listing 8.22. A game ends when an opponent signals to stop it
Listing 8.23. Asking your bot to generate and play a move that’s translated into GTP
Listing 8.24. Your opponent plays moves by responding to genmove
Listing 8.25. Letting one of your bots loose on Pachi
Listing 8.26. Encoding and serializing a GTP response
Listing 8.27. Python imports for your GTP frontend
Listing 8.28. Initializing a GTPFrontend, which defines GTP event handlers
Listing 8.29. The frontend parses from the input stream until the game ends
Listing 8.30. A few of the most important event responses for your GTP frontend
Listing 8.31. Starting your GTP interface from the command line
Chapter 9. Learning by practice: reinforcement learning
Listing 9.1. Calculating return on an action
Listing 9.2. Calculating discounted returns
Listing 9.3. An example of sampling from a probability distribution
Listing 9.4. Sampling from a probability distribution with NumPy
Listing 9.5. Repeatedly sampling from a probability distribution with NumPy
Listing 9.6. Clipping a probability distribution
Listing 9.7. The constructor for the PolicyAgent class
Listing 9.8. Constructing a new learning agent
Listing 9.9. Serializing a PolicyAgent to disk
Listing 9.10. An example of using the serialize function
Listing 9.11. Loading a policy agent from a file
Listing 9.12. Selecting a move with a neural network
Listing 9.13. Constructor for an experience buffer
Listing 9.14. Saving an experience buffer to disk
Listing 9.15. Restoring an ExperienceBuffer from an HDF5 file
Listing 9.16. An object to track decisions within a single episode
Listing 9.17. Integrating an ExperienceCollector with a PolicyAgent
Listing 9.18. Simulating a game between two agents
Listing 9.19. Initialization for generating a batch of experience
Chapter 10. Reinforcement learning with policy gradients
Listing 10.1. Randomly drawing a number from 1 to 5
Listing 10.2. Simulating a game of Add It Up
Listing 10.3. Sample outputs of listing 10.2
Listing 10.4. A policy learning implementation for the simple game Add It Up
Listing 10.5. Encoding experience data as a target vector
Listing 10.6. Training an agent from experience data with policy gradient learning
Listing 10.7. Training on previously saved experience data
Listing 10.8. Script for comparing the strength of two agents
Chapter 11. Reinforcement learning with value methods
Listing 11.1. Pseudocode for an ϵ-greedy policy
Listing 11.2. Defining a model with the Keras sequential API
Listing 11.3. Defining an identical model with the Keras functional API
Listing 11.4. A two-input action-value network
Listing 11.5. Constructor and utility methods for a Q-learning agent
Listing 11.6. Selecting moves for a Q-learning agent
Listing 11.7. Selecting moves for a Q-learning agent
Listing 11.8. Training the Q-learning agent from its experience
Chapter 12. Reinforcement learning with actor-critic methods
Listing 12.1. Updating ExperienceCollector to track advantages
Listing 12.2. Updating ExperienceCollector to store estimated values
Listing 12.3. Calculating advantage at the end of an episode
Listing 12.4. Adding advantage to the ExperienceBuffer structure
Listing 12.5. A two-output network with a policy output and a value output
Chapter 13. AlphaGo: Bringing it all together
Listing 13.1. Initializing a neural network for both policy and value networks in AlphaGo
Listing 13.2. Creating AlphaGo’s strong policy network in Keras
Listing 13.3. Building AlphaGo’s value network in Keras
Listing 13.4. Signature and initialization of your AlphaGo board encoder
Listing 13.5. Loading data for the first step of training AlphaGo’s policy network
Listing 13.6. Creating an AlphaGo policy network with Keras
Listing 13.7. Training and persisting a policy network
Listing 13.8. Loading the trained policy network twice to create two self-play opponents
Listing 13.9. Generating self-play data for your PolicyAgent to learn from
Listing 13.10. Initializing an AlphaGo value network
Listing 13.11. Training a value network from experience data
Listing 13.12. Doing rollouts with the fast policy network
Listing 13.13. A simple view on a node in an AlphaGo tree
Listing 13.14. Defining an AlphaGo tree node in Python
Listing 13.15. Selecting an AlphaGo child by maximizing Q-value
Listing 13.16. Updating visit counts, Q-value, and utility of an AlphaGo node
Listing 13.17. Initializing an AlphaGoMCTS Go playing agent
Listing 13.18. The main method in AlphaGo’s tree-search process
Listing 13.19. Selecting the most visited node and updating the tree’s root node
Listing 13.20. Computing normalized strong policy values for legal moves on the board
Listing 13.21. Playing until the rollout_limit is reached
Listing 13.22. Initializing an AlphaGo agent with three deep neural networks
Chapter 14. AlphaGo Zero: Integrating tree search with reinforcement learning
Listing 14.1. Modifying the board encoder to include passing
Listing 14.2. A structure to track branch statistics
Listing 14.3. A node in an AGZ-style search tree
Listing 14.4. Helpers to read branch information from a tree node
Listing 14.5. Choosing a child branch
Listing 14.6. Walking down the search tree
Listing 14.7. Creating a new node in the search tree
Listing 14.8. Expanding the search tree and updating all node statistics
Listing 14.9. Selecting the move with the highest visit count
Listing 14.10. Simulating a self-play game
Listing 14.11. A specialized experience collector for AGZ-style learning
Listing 14.12. Passing along the decision to the experience collector
Listing 14.13. Training the combined network
Listing 14.14. A single cycle of the reinforcement-learning process
Listing 14.15. Using np.random.dirichlet to sample from a Dirichlet distribution
Listing 14.16. Drawing from a Dirichlet distribution when α is close to zero
Listing 14.17. Adding batch normalization to a Keras network
Appendix E. Submitting a bot to the Online Go Server
Listing E.1. run_gtp_aws.py to run a bot on AWS that connects against OGS