Chapter 3. Implementing your first Go bot

This chapter covers

  • Implementing a Go board by using Python
  • Placing sequences of stones and simulating a game
  • Encoding Go rules for this board to ensure legal moves are played
  • Building a simple bot that can play against a copy of itself
  • Playing a full game against your bot

In this chapter, you’ll build a flexible library that provides data structures to represent Go games and algorithms that enforce the Go rules. As you saw in the preceding chapter, the rules of Go are simple, but in order to implement them on a computer, you have to consider all the edge cases carefully. If you’re a novice to the game of Go or need a refresher of the rules, make sure you’ve read chapter 2. This chapter is technical and requires a good working knowledge of the Go rules to fully appreciate the details.

Representing the Go rules is immensely important, because it’s the foundation for creating smart bots. Your bot needs to understand legal and illegal moves before you can teach it good and bad moves.

At the end of this chapter, you’ll have implemented your first Go bot. This bot is still weak but has all the knowledge about the game of Go it needs to evolve into much stronger versions in the following chapters.

You’ll start by formally introducing the board and fundamental concepts used to play a game of Go with a computer: what is a player, a stone, or a move? Next, you’ll be concerned with game-play aspects. How can a computer quickly check which stones need to be captured or when the ko rule applies? When and how does a game come to an end? We’ll answer all these questions throughout this chapter.

3.1. Representing a game of Go in Python

The game of Go is played on a square board. Usually, beginners start playing on a 9 × 9 or 13 × 13 board, and advanced and pro players play on a 19 × 19 board. But in principle, Go can be played on a board of any size. Implementing a square grid for the game is fairly simple, but you’ll need to take care of a lot of intricacies down the line.

You represent a Go game in Python by building a module we’ll call dlgo step-by-step. Throughout the chapter, you’ll be asked to create files and implement classes and functions that will eventually lead to your first bot. All the code from this and later chapters can be found on GitHub at http://mng.bz/gYPe.

Although you should definitely clone this repository for reference, we strongly encourage you to follow along by creating the files from scratch to see how the library builds up piece by piece. The master branch of our GitHub repository contains all the code used in this book (and more). From this chapter on, there’s additionally a specific Git branch for each chapter that has only the code you need for the given chapter. For instance, the code for this chapter can be found in branch chapter_3. The next chapters follow the same naming convention. Note that we’ve included extensive tests for most of the code found here and in later chapters in the GitHub repository.

To build a Python library to represent Go, you need a data model that’s flexible enough to support a few use cases:

  • Track the progress of a game you’re playing against a human opponent.
  • Track the progress of a game between two bots. This might seem to be exactly the same as the preceding point, but as it turns out, a few subtle differences exist. Most notably, naive bots have a hard time recognizing when the game is over. Playing two simple bots against each other is an important technique used in later chapters, so it’s worth calling out here.
  • Compare many prospective sequences from the same board position.
  • Import game records and generate training data from them.

We start with a few simple concepts, such as what a player or move is. These concepts lay the foundations for tackling all of the preceding tasks in later chapters.

First, create a new folder, dlgo, and place an empty __init__.py file into it to initiate it as a Python module. Also, create two additional files called gotypes.py and goboard_slow.py in which you’ll put all board- and game-play functionality. Your folder structure at this point should look as follows:

dlgo
    __init__.py
    gotypes.py
    goboard_slow.py

Black and white players take turns in Go, and you use enum to represent the different-colored stones. A Player is either black or white. After a player places a stone, you can switch the color by calling the other method on a Player instance. Put this Player class into gotypes.py.

Listing 3.1. Using an enum to represent players
import enum


class Player(enum.Enum):
    black = 1
    white = 2

    @property
    def other(self):
        return Player.black if self == Player.white else Player.white

As noted in the front matter, we’re using Python 3 for this book. One of the reasons is that many modern aspects of the language, such as enums in gotypes.py, are part of the standard library in Python 3.

Next, to represent coordinates on the board, tuples are an obvious choice. The following Point class also goes into gotypes.py.

Listing 3.2. Using tuples to represent points of a Go board
from collections import namedtuple


class Point(namedtuple('Point', 'row col')):
    def neighbors(self):
        return [
            Point(self.row - 1, self.col),
            Point(self.row + 1, self.col),
            Point(self.row, self.col - 1),
            Point(self.row, self.col + 1),
        ]

A namedtuple lets you access the coordinates as point.row and point.col instead of point[0] and point[1], which makes for much better readability.

You also need a structure to represent the actions a player can take on a turn. Normally, a turn involves placing a stone on the board, but a player can also pass or resign at any time. Following American Go Association (AGA) conventions, we use the term move to mean any of those three actions, whereas a play refers to placing a stone. In the Move class, you therefore encode all three types of move (play, pass, resign) and make sure a move has precisely one of these types. For actual plays, you need to pass a Point to be placed. You add this Move class to the file goboard_slow.py.

Listing 3.3. Setting moves: plays, passes, or resigns
import copy
from dlgo.gotypes import Player

class Move():                                                       1
    def __init__(self, point=None, is_pass=False, is_resign=False):
        assert (point is not None) ^ is_pass ^ is_resign
        self.point = point
        self.is_play = (self.point is not None)
        self.is_pass = is_pass
        self.is_resign = is_resign

    @classmethod
    def play(cls, point):                                           2
        return Move(point=point)

    @classmethod
    def pass_turn(cls):                                             3
        return Move(is_pass=True)

    @classmethod
    def resign(cls):                                                4
        return Move(is_resign=True)

  • 1 Any action a player can play on a turn—is_play, is_pass, or is_resign—will be set.
  • 2 This move places a stone on the board.
  • 3 This move passes.
  • 4 This move resigns the current game.

In what follows, clients generally won’t call the Move constructor directly. Instead, you usually call Move.play, Move.pass_turn, or Move.resign to construct an instance of a move.

Note that so far, the Player, Point, and Move classes are all plain data types. Although they’re fundamental to representing the board, they don’t contain any game logic. This is done on purpose, and you’ll benefit from separating game-play concerns like this.

Next, you implement classes that can update the game state by using the preceding three classes:

  • The Board class is responsible for the logic of placing and capturing stones.
  • The GameState class includes all the stones of the board, as well as tracking whose turn it is and what the previous state was.

3.1.1. Implementing the Go board

Before turning to GameState, let’s first implement the Board class. Your first idea might be to create a 19 × 19 array tracking the state of each point in the board, and that’s a good starting point. Now, think about the algorithm for checking when you need to remove stones from the board. Recall that the number of liberties of a single stone is defined by the number of empty points in its direct neighborhood. If all four neighboring points are occupied by an enemy stone, the stone has no liberties left and is captured. For larger groups of connected stones, the situation is more difficult to check. For instance, after placing a black stone, you have to check all neighboring white stones to see whether black captured any stones that you have to remove. Specifically, you have to check the following:

  1. You see whether any neighbors have any liberties left.
  2. You check whether any of the neighbors’ neighbors have any liberties left.
  3. You examine the neighbors’ neighbors’ neighbors, and so forth.

This procedure could require hundreds of steps to finish. Imagine a long chain snaking through the opponent’s territory on a board with 200 moves played already. To speed this up, you can explicitly track all directly connected stones as a unit.

3.1.2. Tracking connected groups of stones in Go: strings

You saw in the preceding section that viewing stones in isolation can lead to an increased computational complexity. Instead, you’ll keep track of groups of connected stones of the same color and their liberties at the same time. Doing so is much more efficient when implementing game logic.

You call a group of connected stones of the same color a string of stones, or simply a string, as shown in figure 3.1. You can build this structure efficiently with the Python set type as in the following GoString implementation, which you also put into goboard_slow.py.

Listing 3.4. Encoding strings of stones with set
class GoString():                                        1
    def __init__(self, color, stones, liberties):
        self.color = color
        self.stones = set(stones)
        self.liberties = set(liberties)

    def remove_liberty(self, point):
        self.liberties.remove(point)

    def add_liberty(self, point):
        self.liberties.add(point)

    def merged_with(self, go_string):                    2
        assert go_string.color == self.color
        combined_stones = self.stones | go_string.stones
        return GoString(
            self.color,
            combined_stones,
            (self.liberties | go_string.liberties) - combined_stones)

    @property
    def num_liberties(self):
        return len(self.liberties)

    def __eq__(self, other):
        return isinstance(other, GoString) and 
            self.color == other.color and 
            self.stones == other.stones and 
            self.liberties == other.liberties

  • 1 Go strings are a chain of connected stones of the same color.
  • 2 Returns a new Go string containing all stones in both strings
Figure 3.1. In this Go game, black has three strings of stones, and white has two. The large white string has six liberties, and the single white stone has only three.

Note that GoString directly tracks its own liberties, and you can access the number of liberties at any point by calling num_liberties, which is much more efficient than the preceding naive approach starting from individual stones.

Also, you have the ability to add and remove liberties from the given string by using remove_liberty and add_liberty. Liberties of a string will usually decrease when opponents play next to this string, and increase when this or another group captures opposing stones adjacent to this string.

Furthermore, note the merged_with method of GoString, which is called when a player connects two of its groups by placing a stone.

3.1.3. Placing and capturing stones on a Go board

After having discussed stones and strings of stones, the natural next step is to discuss how to place stones on the board. Using the GoString class from listing 3.4, the algorithm for placing a stone looks like this:

  1. Merge any adjacent strings of the same color.
  2. Reduce liberties of any adjacent strings of the opposite color.
  3. If any opposite-color strings now have zero liberties, remove them.

Also, if the newly created string has zero liberties, you reject the move. This leads naturally to the following implementation of the Go Board class, which you also place at goboard_slow.py. You allow boards to have any number of rows or columns by instantiating them with num_rows and num_cols appropriately. To keep track of the board state internally, you use the private variable _grid, a dictionary you use to store strings of stones. First off, let’s initiate a Go board instance by specifying its size.

Listing 3.5. Creating a Go Board instance
class Board():                                1
    def __init__(self, num_rows, num_cols):
        self.num_rows = num_rows
        self.num_cols = num_cols
        self._grid = {}

  • 1 A board is initialized as an empty grid with the specified number of rows and columns.

Next, we discuss the Board method to place stones. In place_stone, you first have to inspect all neighboring stones of a given point for liberties.

Listing 3.6. Checking neighboring points for liberties
    def place_stone(self, player, point):
        assert self.is_on_grid(point)
        assert self._grid.get(point) is None
        adjacent_same_color = []
        adjacent_opposite_color = []
        liberties = []
        for neighbor in point.neighbors():           1
            if not self.is_on_grid(neighbor):
                continue
            neighbor_string = self._grid.get(neighbor)
            if neighbor_string is None:
                liberties.append(neighbor)
            elif neighbor_string.color == player:
                if neighbor_string not in adjacent_same_color:
                    adjacent_same_color.append(neighbor_string)
            else:
                if neighbor_string not in adjacent_opposite_color:
                    adjacent_opposite_color.append(neighbor_string)
        new_string = GoString(player, [point], liberties)

  • 1 First, you examine direct neighbors of this point.

Note that the first two lines in listing 3.6 use utility methods to check whether the point is within bounds for the given board and that the point hasn’t been played yet. These two methods are defined as follows.

Listing 3.7. Utility methods for placing and removing stones
    def is_on_grid(self, point):
        return 1 <= point.row <= self.num_rows and 
            1 <= point.col <= self.num_cols

    def get(self, point):                 1
        string = self._grid.get(point)
        if string is None:
            return None
        return string.color

    def get_go_string(self, point):       2
        string = self._grid.get(point)
        if string is None:
            return None
        return string

  • 1 Returns the content of a point on the board: a Player if a stone is on that point, or else None
  • 2 Returns the entire string of stones at a point: a GoString if a stone is on that point, or else None

Note that you also define get_go_string to return the string of stones associated with a given point. This functionality can be helpful in general, but it’s particularly valuable to prevent self-capture, which we’ll discuss in more detail in section 3.2.

Continuing with the definition of place_stone in listing 3.6, right after defining new_string, you follow the outlined three-step approach shown next.

Listing 3.8. Continuing our definition of place_stone
        for same_color_string in adjacent_same_color:               1
            new_string = new_string.merged_with(same_color_string)
        for new_string_point in new_string.stones:
            self._grid[new_string_point] = new_string
        for other_color_string in adjacent_opposite_color:          2
            other_color_string.remove_liberty(point)
        for other_color_string in adjacent_opposite_color:          3
            if other_color_string.num_liberties == 0:
                self._remove_string(other_color_string)

  • 1 Merge any adjacent strings of the same color.
  • 2 Reduce liberties of any adjacent strings of the opposite color.
  • 3 If any opposite-color strings now have zero liberties, remove them.

Now, the only thing missing in our definition of a Go board is how to remove a string of stones, as required in remove_string in the last line of listing 3.8. This is fairly simple, as shown in listing 3.9, but you have to keep in mind that other stones might gain liberties when removing an enemy string. For instance, in figure 3.2, you can see that black can capture a white stone, thereby gaining one additional liberty for each black string of stones.

Listing 3.9. Continuing our definition of place_stone
    def _remove_string(self, string):
        for point in string.stones:
            for neighbor in point.neighbors():                1
                neighbor_string = self._grid.get(neighbor)
                if neighbor_string is None:
                    continue
                if neighbor_string is not string:
                    neighbor_string.add_liberty(point)
            self._grid[point] = None

  • 1 Removing a string can create liberties for other strings.
Figure 3.2. Black can capture a white stone, thereby regaining a liberty for each of the Go strings adjacent to the captured stone.

This definition concludes our Board implementation.

3.2. Capturing game state and checking for illegal moves

Now that you’ve implemented the rules for placing and capturing stones on a Board, let’s move on to playing games by capturing the current state of a game in a GameState class. Roughly speaking, game state knows about the board position, the next player, the previous game state, and the last move that has been played. What follows is just the beginning of the definition. You’ll add more functionality to GameState throughout this section. Again, you put this into goboard_slow.py.

Listing 3.10. Encoding game state for a game of Go
class GameState():
    def __init__(self, board, next_player, previous, move):
        self.board = board
        self.next_player = next_player
        self.previous_state = previous
        self.last_move = move

    def apply_move(self, move):                       1
        if move.is_play:
            next_board = copy.deepcopy(self.board)
            next_board.place_stone(self.next_player, move.point)
        else:
            next_board = self.board
        return GameState(next_board, self.next_player.other, self, move)

    @classmethod
    def new_game(cls, board_size):
        if isinstance(board_size, int):
            board_size = (board_size, board_size)
        board = Board(*board_size)
        return GameState(board, Player.black, None, None)

  • 1 Returns the new GameState after applying the move

At this point, you can already decide when a game is over by adding the following code to your GameState class.

Listing 3.11. Deciding when a game of Go is over
    def is_over(self):
        if self.last_move is None:
            return False
        if self.last_move.is_resign:
            return True
        second_last_move = self.previous_state.last_move
        if second_last_move is None:
            return False
        return self.last_move.is_pass and second_last_move.is_pass

Now that you’ve implemented how to apply a move to the current game state, using apply_move, you should also write code to identify which moves are legal. Humans may accidentally attempt an illegal move. Bots might attempt illegal moves because they don’t know any better. You need to check three rules:

  • Confirm that the point you want to play is empty.
  • Check that the move isn’t a self-capture.
  • Confirm that the move doesn’t violate the ko rule.

Although the first point is trivial to implement, the other two deserve separate treatment, because they’re rather tricky to handle properly.

3.2.1. Self-capture

When a string of your stones has only one liberty left and you play at the point that removes that liberty, we call that a self-capture. For instance, in figure 3.3, the black stones are doomed.

Figure 3.3. In this Go board state, the three black stones have one liberty left; namely, the marked point. You enforce the self-capture rule, so black isn’t allowed to play there. White, on the other hand, can capture the three black stones by playing on the marked point.

White can capture them at any time by playing on the marked point, and black has no way to prevent it. But what if black played on the marked point? The entire group would have no liberties and would then be captured. Most rule sets forbid such a play, although a few exceptions exist. Most notably, self-capture is allowed in the quadrennial Ing Cup, which is one of the biggest prizes in international Go.

You’ll enforce the self-capture rule in your code. This is consistent with the most popular rule sets, and it also reduces the number of moves your bots need to consider. It’s possible to contrive a position where self-capture is the best move, but such situations are basically unheard of in serious games.

If you alter the surrounding stones in figure 3.3 slightly, you end up with a completely different situation, shown in figure 3.4.

Figure 3.4. In this situation, the marked point is a capture for black, not suicide, because black will capture two white stones and thereby regain two liberties.

Note that in figure 3.4, and in general, you must remove opponent stones first before checking whether the newly played stone has any liberties. In all rule sets, this next move is a valid capture, not a self-capture, because black will regain two liberties by capturing the two white stones.

Note that the Board class does permit self-capture, but in GameState you’ll enforce the rule by applying the move to a copy of the board and checking the number of liberties afterward.

Listing 3.12. Continuing our definition of GameState to enforce the self-capture rule
    def is_move_self_capture(self, player, move):
        if not move.is_play:
            return False
        next_board = copy.deepcopy(self.board)
        next_board.place_stone(player, move.point)
        new_string = next_board.get_go_string(move.point)
        return new_string.num_liberties == 0

3.2.2. Ko

Having checked for self-capture, you can now move on to implement the ko rule. Chapter 2 briefly covered ko and its importance in the game of Go. Roughly speaking, the ko rule applies if a move would return the board to the exact previous position. This doesn’t imply that a player can’t hit back immediately, as the following sequence of diagrams shows. In figure 3.5, white has just played the isolated stone on the bottom. Black’s two stones now have only a single liberty left—but so does this white stone.

Figure 3.5. White wants to capture the two black stones in this situation, but white’s stone has only one liberty left.

Black can now try to save its two stones by capturing this white stone, as shown in figure 3.6.

Figure 3.6. Continuing, black tries to rescue its two stones by capturing the isolated white stone.

But white can immediately play at the same point that was played in figure 3.5, as shown in figure 3.7.

You can see that white can immediately recapture the three black stones, and the ko rule doesn’t apply, because the overall board positions in figures 3.5 and 3.7 are different. This play is known as a snapback. In simple situations, the ko rule boils down to not being able to immediately recapture a stone. But snapbacks are common, and the existence of such positions shows that you have to be careful when implementing ko.

Figure 3.7. In this situation, white can just snap back (retake the three black stones) without violating the ko rule.

You can specify the ko rule in many ways, but those ways are practically equivalent except in rare situations. The rule you’ll enforce in your code is that a player may not play a stone that would re-create a previous game state, where the game state includes both the stones on the board and the player whose turn is next. This particular formulation is known as the situational superko rule.

Because each GameState instance keeps a pointer to the previous state, you can enforce the ko rule by walking back up the tree and checking the new state against the whole history. You do so by adding the following method to your GameState implementation.

Listing 3.13. Does the current game state violate the ko rule?
    @property
    def situation(self):
        return (self.next_player, self.board)

    def does_move_violate_ko(self, player, move):
        if not move.is_play:
            return False
        next_board = copy.deepcopy(self.board)
        next_board.place_stone(player, move.point)
        next_situation = (player.other, next_board)
        past_state = self.previous_state
        while past_state is not None:
            if past_state.situation == next_situation:
                return True
            past_state = past_state.previous_state
        return False

This implementation is simple and correct, but it’s relatively slow. For each move, you create a deep copy of the board state and have to compare this state against all previous states, which adds up over time. In section 3.5, you’ll encounter an interesting technique to speed up this step.

To wrap up your GameState definition, you can now decide whether a move is valid by using knowledge from section 3.2 about both ko and self-capture.

Listing 3.14. Is this move valid for the given game state?
    def is_valid_move(self, move):
        if self.is_over():
            return False
        if move.is_pass or move.is_resign:
            return True
        return (
            self.board.get(move.point) is None and
            not self.is_move_self_capture(self.next_player, move) and
            not self.does_move_violate_ko(self.next_player, move))

3.3. Ending a game

A key concept in computer Go is self-play. In self-play, you usually start with a weak Go-playing agent, have it play against itself, and use the game results to build a stronger bot. In chapter 4, you’ll use self-play to evaluate board positions. In chapters 9 through 12, you’ll use self-play to evaluate individual moves and the algorithms that selected them.

To take advantage of this technique, you need to make sure your self-play games end. Human games end when neither player can gain an advantage with their next move. This is a tricky concept even for humans. Beginners often end games by playing hopeless moves in the opponent’s territory, or watching their opponent cut into what they believed to be solid territory. For computers, it’s even more difficult. If our bot continues playing as long as legal moves remain available, it’ll eventually fill up its own liberties and lose all of its stones.

You could think up some heuristics to help the bot finish the game in a sensible manner. For example:

  • Don’t play in a region that’s completely surrounded by stones of the same color.
  • Don’t play a stone that would have only one liberty.
  • Always capture an opposing stone with only one liberty.

Unfortunately, all of those rules are too strict. If our bots followed these rules, strong opponents would take advantage to kill groups that should live, rescue groups that should die, or simply gain a better position. Generally, our handcrafted rules should restrict the bot’s options as little as possible, so that more sophisticated algorithms are free to learn the advanced tactics.

To solve this problem, you can look to the history of the game. In ancient times, the winner was simply the player with the most stones on the board. Players would end the game by filling every point they could, leaving only eyes for their groups empty. This could make the end of a game drag out for a long time, so players came up with a way to speed it up. If black clearly controlled a region of the board (any white stone played there would eventually get captured), there was no need for black to fill that region with stones. The players would just agree to count that area for black. This is where the concept of territory came from, and over centuries the rules evolved so that territory was counted explicitly.

Scoring in this method avoids the question of what is territory and what isn’t, but you still have to prevent your bot from killing its own stones; see figure 3.8.

Figure 3.8. White has two eyes in the corner, at A and B, and shouldn’t place a stone at either point. Otherwise, black can capture the whole group. Your naive bot won’t be allowed to fill its own eye.

You’ll hardcode a rule that prevents the bot from filling in its own eyes, under the strictest possible definition. For our purposes, an eye is an empty point where all adjacent points and at least three out of four diagonally adjacent points are filled with friendly stones.

Note

Experienced Go players may notice that the preceding definition of eye will miss a valid eye in some cases. We’ll accept those errors to keep the implementation simple.

You have to create a special case for eyes on the edge of the board; in that case, all the diagonally adjacent points must contain friendly stones.

You create a new submodule of dlgo called agent (by creating a new folder named agent and an empty __init__.py file within that folder) and place the following is_point_an_eye function into a file called helpers.py.

Listing 3.15. Is the given point on the board an eye?
from dlgo.gotypes import Point


def is_point_an_eye(board, point, color):
    if board.get(point) is not None:                       1
        return False
    for neighbor in point.neighbors():                     2
        if board.is_on_grid(neighbor):
            neighbor_color = board.get(neighbor)
            if neighbor_color != color:
                return False

    friendly_corners = 0                                   3
    off_board_corners = 0
    corners = [
        Point(point.row - 1, point.col - 1),
        Point(point.row - 1, point.col + 1),
        Point(point.row + 1, point.col - 1),
        Point(point.row + 1, point.col + 1),
    ]
    for corner in corners:
        if board.is_on_grid(corner):
            corner_color = board.get(corner)
            if corner_color == color:
                friendly_corners += 1
        else:
            off_board_corners += 1
    if off_board_corners > 0:
        return off_board_corners + friendly_corners == 4   4
    return friendly_corners >= 3                           5

  • 1 An eye is an empty point.
  • 2 All adjacent points must contain friendly stones.
  • 3 We must control three out of four corners if the point is in the middle of the board; on the edge, you must control all corners.
  • 4 Point is on the edge or corner.
  • 5 Point is in the middle.

You aren’t explicitly concerned with determining the result of a game yet in this chapter, but counting points at the end of a game is definitely an important topic. Different tournaments and Go federations enforce slightly different rule sets. Throughout the book, you’ll have your bots follow the AGA rules for area counting, also known as Chinese counting. Although Japanese rules are more popular in casual play, the AGA rules are a little easier for computers, and the rule differences rarely affect the outcome of a game.

3.4. Creating your first bot: the weakest Go AI imaginable

Having finished the implementation of the Go board and encoded game state, you can build your first Go-playing bot. This bot will be a weak player, but it’ll lay the foundation for all of your improved bots to come. First, you define the interface that all of your bots will follow. You put the definition of an agent into base.py in the agent module.

Listing 3.16. Your central interface for Go agents
class Agent:
    def __init__(self):
        pass

    def select_move(self, game_state):
        raise NotImplementedError()

That’s it, just this one method. All a bot does is select a move given the current game state. Of course, internally this may require other complex tasks such as evaluating the current position, but to play a game, that’s all our bot will ever need.

Our first implementation will be as naive as possible: it’ll randomly select any valid move that doesn’t fill in one of its own eyes. If no such move exists, it’ll pass. You place this random bot into naive.py under agents. Recall from chapter 2 that student ranks in Go usually range from 30 kyu to 1 kyu. By that scale, your random bot plays at the 30 kyu level, an absolute beginner.

Listing 3.17. A random Go bot, playing at about 30 kyu strength
import random
from dlgo.agent.base import Agent
from dlgo.agent.helpers import is_point_an_eye
from dlgo.goboard_slow import Move
from dlgo.gotypes import Point


class RandomBot(Agent):
    def select_move(self, game_state):
        """Choose a random valid move that preserves our own eyes."""
        candidates = []
        for r in range(1, game_state.board.num_rows + 1):
            for c in range(1, game_state.board.num_cols + 1):
                candidate = Point(row=r, col=c)
                if game_state.is_valid_move(Move.play(candidate)) and 
                        not is_point_an_eye(game_state.board,
                                            candidate,
                                            game_state.next_player):
                    candidates.append(candidate)
        if not candidates:
            return Move.pass_turn()
        return Move.play(random.choice(candidates))

At this point, your module structure should look as follows (make sure to place an empty __init__.py in the folder to initialize the submodule):

dlgo
  ...
  agent
    __init__.py
    helpers.py
    base.py
    naive.py

Finally, you can set up a driver program that plays a full game between two instances of your random bot. First, you define convenient helper functions, such as printing the full board or an individual move on the console.

Go board coordinates can be specified in many ways, but in Europe it’s most common to label the columns with letters of the alphabet, starting with A, and the rows with increasing numbers, starting at 1. In these coordinates, on a standard 19 × 19 board, the lower-left corner would be A1, and the top-right corner T19. Note that by convention the letter I is omitted to avoid confusion with 1.

You define a string variable COLS = 'ABCDEFGHJKLMNOPQRST', whose characters stand for the columns of the Go board. To display the board on the command line, you encode an empty field with a point (.), a black stone with an x, and a white one with an o. The following code goes into a new file we call utils.py in the dlgo package. You create a print_move function that prints the next move to the command line, and a print_board function that prints the current board with all its stones. You put this code in a file called bot_v_bot.py outside the dlgo module.

Listing 3.18. Utility functions for bot vs. bot games
from dlgo import gotypes

COLS = 'ABCDEFGHJKLMNOPQRST'
STONE_TO_CHAR = {
    None: ' . ',
    gotypes.Player.black: ' x ',
    gotypes.Player.white: ' o ',
}


def print_move(player, move):
    if move.is_pass:
        move_str = 'passes'
    elif move.is_resign:
        move_str = 'resigns'
    else:
        move_str = '%s%d' % (COLS[move.point.col - 1], move.point.row)
    print('%s %s' % (player, move_str))


def print_board(board):
    for row in range(board.num_rows, 0, -1):
        bump = " " if row <= 9 else ""
        line = []
        for col in range(1, board.num_cols + 1):
            stone = board.get(gotypes.Point(row=row, col=col))
            line.append(STONE_TO_CHAR[stone])
        print('%s%d %s' % (bump, row, ''.join(line)))
    print('    ' + '  '.join(COLS[:board.num_cols]))

You can set up a script that initiates two random bots that play each other on a 9 × 9 board until they decide the game is over.

Listing 3.19. A script to let a bot play against itself
from dlgo import agent
from dlgo import goboard
from dlgo import gotypes
from dlgo.utils import print_board, print_move
import time


def main():
    board_size = 9
    game = goboard.GameState.new_game(board_size)
    bots = {
        gotypes.Player.black: agent.naive.RandomBot(),
        gotypes.Player.white: agent.naive.RandomBot(),
    }
    while not game.is_over():
        time.sleep(0.3)                                  1

        print(chr(27) + "[2J")                           2
        print_board(game.board)
        bot_move = bots[game.next_player].select_move(game)
        print_move(game.next_player, bot_move)
        game = game.apply_move(bot_move)


if __name__ == '__main__':
    main()

  • 1 You set a sleep timer to 0.3 seconds so that bot moves aren’t printed too fast to observe.
  • 2 Before each move, you clear the screen. This way, the board is always printed to the same position on the command line.

You can start a bot game on the command line by running the following:

python bot_v_bot.py

You should see a lot of moves printed on the screen, and the game will end in both players passing. Recall that you encoded black stones as x, white stones as o, and empty points as a point (.). Here’s an example of the last white move in a generated game:

9 o.ooooooo
8 ooooxxoxx
7 oooox.xxx
6 o.ooxxxxx
5 ooooxxxxx
4 ooooxxxxx
3 o.ooox.xx
2 ooooxxxxx
1 o.oooxxx.
  ABCDEFGHJ
Player.white passes

This bot is not only weak, but also a frustrating opponent: it’ll keep stubbornly placing stones until the whole board is filled in, even if its position is hopeless. Moreover, no matter how often you let these bots play against each other, no learning is involved. This random bot will remain at its current level forever.

Throughout the rest of the book, you’ll slowly improve on both of those weaknesses, building up an ever more interesting and powerful Go engine.

3.5. Speeding up game play with Zobrist hashing

Before wrapping up the chapter by describing how to play against your random bot, let’s quickly address a speed issue in your current implementation by introducing an important technique. If you’re not interested in ways to speed up your implementation, it’s safe to skip to section 3.6 right away.

Recall from section 3.2 that to check for situational superko, you need to go through the entire history of positions of that game to see whether the current position has been there before. This is computationally expensive. To avoid this problem, you alter your setup slightly: instead of storing past board positions altogether, you simply store much smaller hash values of it.

Hashing techniques are omnipresent in computer science. One in particular is widely used in other games, such as chess: Zobrist hashing (named after computer scientist Albert Zobrist, who built one of the first Go bots in the early 1970s). In Zobrist hashing, you assign a hash value to each possible Go move on the board. For best results, you should choose each hash value randomly. In Go, each move is either black or white, so on a 19 × 19 board, a full Zobrist hash table consists of 2 × 19 × 19 = 722 hash values. You use this small number of 722 hashes representing individual moves to encode the most complex of board positions. Figure 3.9 shows how it works.

Figure 3.9. Using Zobrist hashes to encode moves and efficiently store game state

What’s interesting about the procedure shown in figure 3.9 is that a full board state can be encoded by a single hash value. You start with the hash value of the empty board, which you can choose to be 0 for simplicity. The first move has a hash value, and you can apply this move to the board by carrying out an XOR operation of the board with the move hash. We call this operation applying the hash. Following this convention, for each new move, you can apply its current hash to the board. This allows you to track the current board state as one hash value.

Note that you can reverse any move by applying its hash again (a convenient feature of the XOR operation). We call this unapplying the hash value. This is important, because with this property, you can easily remove stones from the board when they’re captured. For instance, if a single black stone at C3 on the board is captured, you can remove it from the current board state hash by applying the hash value of C3. Of course, in this scenario, the hash value of the white stone capturing C3 has to be applied as well. If a white move captures multiple black stones, you unapply all of their hashes.

Given that you’ve chosen your hash values sufficiently large and general, so that no hash collisions occur (two different game states never result in the same hash value), you can encode any board position like this. In practice, you don’t check for hash collisions, but simply assume there are none.

To implement Zobrist hashing for your Go board implementation, you first need to generate the hashes. You generate 64-bit random integers by using Python’s random library for each of the 3 × 19 × 19 possible point states. Note that in Python the symbol ^ carries out an XOR operation. For the empty board, you choose a value of 0.

Listing 3.20. Generating Zobrist hashes
import random

from dlgo.gotypes import Player, Point


def to_python(player_state):
    if player_state is None:
        return 'None'
    if player_state == Player.black:
        return Player.black
    return Player.white


MAX63 = 0x7fffffffffffffff

table = {}
empty_board = 0
for row in range(1, 20):
    for col in range(1, 20):
        for state in (Player.black, Player.white):
            code = random.randint(0, MAX63)
            table[Point(row, col), state] = code

print('from .gotypes import Player, Point')
print('')
print("__all__ = ['HASH_CODE', 'EMPTY_BOARD']")
print('')
print('HASH_CODE = {')
for (pt, state), hash_code in table.items():
    print('    (%r, %s): %r,' % (pt, to_python(state), hash_code))
print('}')
print('')
print('EMPTY_BOARD = %d' % (empty_board,))

Running this script prints the desired hashes on the command line. Executing the preceding code generates Python code printed to the command line. Place this output in the file zobrist.py in the dlgo module.

Now that you have the hashes available, all you need to do is to replace your old state-tracking mechanism by storing hashes instead. Create a copy of goboard_slow.py called goboard.py, in which you’ll make all the necessary changes for the rest of this section. Alternatively, you can follow the code in goboard.py from our GitHub repository. You start with a slight modification to make GoString and both stones and liberties immutable, meaning they can’t by modified after they’re created. You can do this by using Python’s frozenset instead of set. A frozenset doesn’t have methods to add or remove items, so you need to create a new set instead of modifying an existing one.

Listing 3.21. GoString instances with immutable sets of stones and liberties
class GoString:
    def __init__(self, color, stones, liberties):
        self.color = color
        self.stones = frozenset(stones)
        self.liberties = frozenset(liberties)                       1

    def without_liberty(self, point):                               2
        new_liberties = self.liberties - set([point])
        return GoString(self.color, self.stones, new_liberties)

    def with_liberty(self, point):                                  3
        new_liberties = self.liberties | set([point])
        return GoString(self.color, self.stones, new_liberties)

  • 1 Stones and liberties are now immutable frozenset instances.
  • 2 The without_liberty method replaces the previous remove_liberty method...
  • 3 ... and with_liberty replaces add_liberty.

In GoString, you replace two methods to account for immutable state and leave the other helper methods, such as merged_with or num_liberties, untouched.

Next, you update relevant parts of the Go board. Remember to place all the code for the rest of this section in goboard.py, your copy of goboard_slow.py.

Listing 3.22. Instantiating the Go board with a _hash value for the empty board
from dlgo import zobrist

class Board:
    def __init__(self, num_rows, num_cols):
        self.num_rows = num_rows
        self.num_cols = num_cols
        self._grid = {}
        self._hash = zobrist.EMPTY_BOARD

Next, in your place_stone method, whenever a new stone is placed, you apply the hash of the respective color. Make sure to apply these changes to the goboard.py file as well, just as with every other piece of code in this section.

Listing 3.23. Placing a stone means applying the hash of that stone
        new_string = GoString(player, [point], liberties)              1

        for same_color_string in adjacent_same_color:                  2
            new_string = new_string.merged_with(same_color_string)
        for new_string_point in new_string.stones:
            self._grid[new_string_point] = new_string

        self._hash ^= zobrist.HASH_CODE[point, player]                 3

        for other_color_string in adjacent_opposite_color:
            replacement = other_color_string.without_liberty(point)    4
            if replacement.num_liberties:

     self._replace_string(other_color_string.without_liberty(point))
            else:
                self._remove_string(other_color_string)                5

  • 1 Until this line, place_stone remains the same.
  • 2 You merge any adjacent strings of the same color.
  • 3 Next, you apply the hash code for this point and player.
  • 4 Then you reduce liberties of any adjacent strings of the opposite color.
  • 5 If any opposite-color strings now have zero liberties, remove them.

To remove a stone, you apply its hash to the board once again.

Listing 3.24. Removing a stone means unapplying the hash value of the stone
    def _replace_string(self, new_string):                         1
        for point in new_string.stones:
            self._grid[point] = new_string

    def _remove_string(self, string):
        for point in string.stones:
            for neighbor in point.neighbors():                     2
                neighbor_string = self._grid.get(neighbor)
                if neighbor_string is None:
                    continue
                if neighbor_string is not string:
                    self._replace_string(neighbor_string.with_liberty(point))
            self._grid[point] = None

            self._hash ^= zobrist.HASH_CODE[point, string.color]   3

  • 1 This new helper method updates your Go board grid.
  • 2 Removing a string can create liberties for other strings.
  • 3 With Zobrist hashing, you need to unapply the hash for this move.

The last thing you add to the Board class is a utility method to return the current Zobrist hash.

Listing 3.25. Returning the current Zobrist hash of the board
    def zobrist_hash(self):
        return self._hash

Now that you’ve encoded your Go board with Zobrist hashing values, let’s see how to improve GameState with it.

Before, the previous game state was set like this: self.previous_state = previous, which we argued was too expensive, because you had to cycle through all past states to check for ko. Instead, you want to store Zobrist hashes, and you do this by using the new variable previous_states, as shown in the next code listing.

Listing 3.26. Initializing game state with Zobrist hashes
class GameState:
    def __init__(self, board, next_player, previous, move):
        self.board = board
        self.next_player = next_player
        self.previous_state = previous
        if self.previous_state is None:
            self.previous_states = frozenset()
        else:
            self.previous_states = frozenset(
                previous.previous_states |
                {(previous.next_player, previous.board.zobrist_hash())})
        self.last_move = move

If the board is empty, self.previous_states is an empty immutable frozenset. Otherwise, you augment the states by a pair: the color of the next player and the Zobrist hash of the previous game state.

Having set up all this, you can finally drastically improve on your does_move_violate_ko implementation.

Listing 3.27. Fast checking of game states for ko with Zobrist hashes
    def does_move_violate_ko(self, player, move):
        if not move.is_play:
            return False
        next_board = copy.deepcopy(self.board)
        next_board.place_stone(player, move.point)
        next_situation = (player.other, next_board.zobrist_hash())
        return next_situation in self.previous_states

Checking previous board states by the single line next_situation in self.previous_states is an order of magnitude faster than the explicit loop over board states you had before.

This interesting hashing trick will enable much faster self-play in later chapters, leading to much quicker improvements in game play.

Further speeding up your Go board implementation

We gave an in-depth treatment of the original goboard_slow.py implementation and showed how to speed it up with Zobrist hashing to arrive at goboard.py. In the GitHub repository, you’ll see yet another Go board implementation called goboard_fast.py, in which you can further speed up game play. These speed improvements are extremely valuable in later chapters, but come as a trade-off for readability.

If you’re interested in how to make your Go board even quicker, look at goboard_fast.py and the comments found there. Most of the optimizations are tricks to avoid constructing and copying Python objects.

3.6. Playing against your bot

Having created a weak bot that plays against itself, you may wonder whether you can test your knowledge from chapter 2 to play against it yourself. This is indeed possible and doesn’t require a lot of changes compared to the setup for bot versus bot.

You need one more utility function that you put into utils.py to help you read coordinates from human input.

Listing 3.28. Transforming human input into coordinates for your Go board
def point_from_coords(coords):
    col = COLS.index(coords[0]) + 1
    row = int(coords[1:])
    return gotypes.Point(row=row, col=col)

This function transforms inputs like C3 or E7 into Go board coordinates. With this in mind, you can now set up your program for a 9 × 9 game in human_v_bot.py like this.

Listing 3.29. Setting up a script so you can play your own bot
from dlgo import agent
from dlgo import goboard_slow as goboard
from dlgo import gotypes
from dlgo.utils import print_board, print_move, point_from_coords
from six.moves import input


def main():
    board_size = 9
    game = goboard.GameState.new_game(board_size)
    bot = agent.RandomBot()

    while not game.is_over():
        print(chr(27) + "[2J")
        print_board(game.board)
        if game.next_player == gotypes.Player.black:
            human_move = input('-- ')
            point = point_from_coords(human_move.strip())
            move = goboard.Move.play(point)
        else:
            move = bot.select_move(game)
        print_move(game.next_player, move)
        game = game.apply_move(move)


if __name__ == '__main__':
    main()

You, the human player, will play as black. Your random bot takes white. Start the script with the following:

python human_v_bot.py

You’ll be prompted to type in a move and confirm with enter. For instance, if you choose to play G3 as your first move, the bot’s response may look as follows:

Player.white D8
9 .........
8 ...o.....
7 .........
6 .........
5 .........
4 .........
3 ......x..
2 .........
1 .........
  ABCDEFGHJ

If you wish, you can continue and play a full game against this bot. But because the bot plays random moves, it isn’t interesting to do so yet.

Note that in terms of following the Go rules, this bot is complete already. It knows everything about the game of Go it’ll ever need to know. This fact is important, because it frees you to completely focus on the algorithms that improve game play from now on. This bot represents the baseline you start from. In the following chapters, you’ll introduce more-interesting techniques to create stronger bots.

3.7. Summary

  • The two players of a game of Go are best encoded by using an enum.
  • A point on the Go board is characterized by its immediate neighborhood.
  • A move in Go is either playing, passing, or resigning.
  • Strings of stones are connected groups of stones of the same color. Strings are important to efficiently check for captured stones after placing a stone.
  • The Go Board has all the logic of placing and capturing stones.
  • In contrast, GameState keeps track of whose turn it is, the stones currently on the board, and the history of previous states.
  • Ko can be implemented by using the situational superko rule.
  • Zobrist hashing is an important technique to efficiently encode game-play history and speed up checking for ko.
  • A Go-playing agent can be defined with one method: select_move.
  • Your random bot can play against itself, other bots, and human opponents.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset