Chapter 8. Deploying bots in the wild

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8. Deploying bots in the wild

This chapter covers

Building an end-to-end application to train and run a Go bot
Running a frontend to play against your bot
Letting your bot play against other bots locally
Deploying your bot on an online Go server

By now, you know how to build and train a strong deep-learning model for Go move prediction—but how do you integrate this into an application that plays games against opponents? Training a neural network is just one part of building an end-to-end application, whether you’re playing yourself or letting your bot compete against other bots. The trained model has to be integrated into an engine that can be played against.

In this chapter, you’ll build a simple Go model server and two frontends. First, we provide you with an HTTP frontend that you can use to play against your bot. Then, we introduce you to the Go Text Protocol (GTP), a widely used protocol that Go bots use to exchange information, so your bot can play against other bots like GNU Go or Pachi, two freely available Go programs based on GTP. Finally, we show you how to deploy your Go bot on Amazon Web Services (AWS) and connect it against the Online Go Server (OGS). Doing so will allow your bots to play ranked games in a real environment, compete against other bots and human players worldwide, and even enter tournaments. To do all this, we’ll show you how to tackle the following tasks:

Building a move-prediction agent— The neural networks you trained in chapters 6 and 7 need to be integrated into a framework that allows you to use them in game play. In section 8.1, we’ll pick up the idea of agents from chapter 3 (in which you created a randomly playing agent) as a basis to serve a deep-learning bot.
Providing a graphical interface— As humans, to conveniently play against a Go bot, we need some sort of (graphical) interface. Although so far we’ve been happy with command-line interfaces, in section 8.2 we’ll equip you with a fun-to-play frontend for your bot.
Deploying a bot in the cloud— If you don’t have a powerful GPU in your computer, you won’t get far training strong Go bots. Luckily, most big cloud providers offer GPU instances on demand. But even if you have a strong-enough GPU for training, you still may want to host your previously trained model on a server. In section 8.3, we’ll show you how this can be done and refer you to appendix D for more details on how to set everything up in AWS.
Talking to other bots— Humans use graphical and other interfaces to interact with each other. For bots, it’s customary to communicate through a standardized protocol. In section 8.4, we’ll introduce you to the common Go Text Protocol (GTP). This is an essential component for the following two points:
- Playing against other bots— You’ll then build a GTP frontend for your bot to let it play against other programs in section 8.5. We’ll show you how to let your bot play against two other Go programs locally, to see how well your creation does.
- Deploying a bot on an online Go server— In section 8.6, we’ll finally show you how to deploy a bot on an online Go platform so that registered users and other bots can compete against your bot. This way, your bot can even enter ranked games and enter tournaments, all of which we’ll show you in this last section. Because most of this material is technical, you’ll find most of the details in appendix E.

8.1. Creating a move-prediction agent from a deep neural network

Now that you have all the building blocks in place to build a strong neural network for Go data, let’s integrate such networks into an agent that will serve them. Recall from chapter 3 the concept of Agent. We defined it as a class that can select the next move for the current game state, by implementing a select_move method. Let’s write a DeepLearningAgent by using Keras models and our Go board Encoder concept (put this code into predict.py in the agent module in dlgo).

Listing 8.1. Initializing an agent with a Keras model and a Go board encoder

import numpy as np

from dlgo.agent.base import Agent
from dlgo.agent.helpers import is_point_an_eye
from dlgo import encoders
from dlgo import goboard
from dlgo import kerasutil

class DeepLearningAgent(Agent):
    def __init__(self, model, encoder):
        Agent.__init__(self)
        self.model = model
        self.encoder = encoder

You’ll use the encoder to transform the board state into features, and you’ll use the model to predict the next move. In fact, you’ll use the model to compute a whole probability distribution of possible moves that you’ll later sample from.

Listing 8.2. Encoding board state and predicting move probabilities with a model

    def predict(self, game_state):
        encoded_state = self.encoder.encode(game_state)
        input_tensor = np.array([encoded_state])
        return self.model.predict(input_tensor)[0]

    def select_move(self, game_state):
        num_moves = self.encoder.board_width * self.encoder.board_height
        move_probs = self.predict(game_state)

Next, you alter the probability distribution stored in move_probs a little. First, you compute the cube of all values to drastically increase the distance between more-likely and less-likely moves. You want the best possible moves to be picked much more often. Then you use a trick called clipping that prevents move probabilities from being too close to either 0 or 1. This is done by defining a small positive value, ϵ = 0.000001, and setting values smaller than ϵ to ϵ, and values larger than 1 – ϵ to 1 – ϵ. Afterward, you normalize the resulting values to end up with a probability distribution once again.

Listing 8.3. Scaling, clipping, and renormalizing your move probability distribution

        move_probs = move_probs ** 3                        1
        eps = 1e-6
        move_probs = np.clip(move_probs, eps, 1 - eps)      2
        move_probs = move_probs / np.sum(move_probs)        3

1 Increases the distance between the more likely and least likely moves
2 Prevents move probabilities from getting stuck at 0 or 1
3 Renormalizes to get another probability distribution

You do this transformation because you want to sample moves from this distribution, according to their probabilities. Instead of sampling moves, another viable strategy would be to always take the most likely move (taking the maximum over the distribution). The benefit of the way you’re doing it is that sometimes other moves get chosen, which might be especially useful when there isn’t one single move that sticks out from the rest.

Listing 8.4. Trying to apply moves from a ranked candidate list

        candidates = np.arange(num_moves)                                1
        ranked_moves = np.random.choice(
            candidates, num_moves, replace=False, p=move_probs)          2
        for point_idx in ranked_moves:
            point = self.encoder.decode_point_index(point_idx)
            if game_state.is_valid_move(goboard.Move.play(point)) and 
                    not is_point_an_eye(game_state.board, point,
     game_state.next_player):                                            3
                return goboard.Move.play(point)
        return goboard.Move.pass_turn()                                  4

1 Turns the probabilities into a ranked list of moves
2 Samples potential candidates
3 Starting from the top, finds a valid move that doesn’t reduce eye-space
4 If no legal and non-self-destructive moves are left, passes

For convenience, you also want to persist a DeepLearningAgent, so you can pick it up at a later point. The prototypical situation in practice is this: you train a deep-learning model and create an agent, which you then persist. At a later point, this agent gets deserialized and served, so human players or other bots can play against it. To do the serialization step, you hijack the serialization format of Keras. When you persist a Keras model, it gets stored in HDF5, an efficient serialization format. HDF5 files contain flexible groups that are used to store meta-information and data. For any Keras model, you can call model.save("model_path.h5") to persist the full model, meaning the neural network architecture and all weights, to the local file model_path.h5. The only thing you need to do before persisting a Keras model like this is to install the Python library h5py; for instance, with pip install h5py.

To store a complete agent, you can add an additional group for information about your Go board encoder.

Listing 8.5. Serializing a deep-learning agent

    def serialize(self, h5file):
        h5file.create_group('encoder')
        h5file['encoder'].attrs['name'] = self.encoder.name()
        h5file['encoder'].attrs['board_width'] = self.encoder.board_width
        h5file['encoder'].attrs['board_height'] = self.encoder.board_height
        h5file.create_group('model')
        kerasutil.save_model_to_hdf5_group(self.model, h5file['model'])

Finally, after you serialize a model, you also need to know how to load it from an HDF5 file.

Listing 8.6. Deserializing a `DeepLearningAgent` from an HDF5 file

def load_prediction_agent(h5file):
    model = kerasutil.load_model_from_hdf5_group(h5file['model'])
    encoder_name = h5file['encoder'].attrs['name']
    if not isinstance(encoder_name, str):
        encoder_name = encoder_name.decode('ascii')
    board_width = h5file['encoder'].attrs['board_width']
    board_height = h5file['encoder'].attrs['board_height']
    encoder = encoders.get_encoder_by_name(
        encoder_name, (board_width, board_height))
    return DeepLearningAgent(model, encoder)

This completes our definition of a deep-learning agent. As a next step, you have to make sure this agent connects and interacts with an environment. You do this by embedding DeepLearningAgent into a web application that human players can play against in their browser.

8.2. Serving your Go bot to a web frontend

In chapters 6 and 7, you designed and trained a neural network that predicts what move a human would play in a Go game. In section 8.1, you turned that model for move prediction into a DeepLearningAgent that does move selection. The next step is to play your bot! Back in chapter 3, you built a bare-bones interface in which you could type in moves on your keyboard, and your benighted RandomBot would print its reply to the console. Now that you’ve built a more sophisticated bot, it deserves a nicer frontend to communicate moves with a human player.

In this section, you’ll connect the DeepLearningAgent to a Python web application, so you can play against it in your web browser. You’ll use the lightweight Flask library to serve such an agent via HTTP. On the browser side, you’ll use a JavaScript library called jgoboard to render a Go board that humans can use. The code can be found in our repository on GitHub, in the httpfrontend module in dlgo. We don’t explicitly discuss this code here, because we don’t want to distract from the main topic, building a Go AI, by digressing into web development techniques in other languages (such as HTML or JavaScript). Instead, we’ll give you an overview of what the application does and how to use it in an end-to-end example. Figure 8.1 provides an overview of the application you’re going to build in this chapter.

Figure 8.1. Building a web frontend for your Go bot. The httpfrontend module starts a Flask web server that decodes HTTP requests and passes them to one or more Go-playing agents. In the browser, a client based on the jgoboard library communicates with the server over HTTP.

If you look into the structure of httpfrontend, you find a file called server.py that has a single, well-documented method, get_web_app, that you can use to return a web application to run. Here’s an example of how to use get_web_app to load a random bot and serve it.

Listing 8.7. Registering a random agent and starting a web application with it

from dlgo.agent.naive import RandomBot
from dlgo.httpfrontend.server import get_web_app

random_agent = RandomBot()
web_app = get_web_app({'random': random_agent})
web_app.run()

When you run this example, a web application will start on localhost (127.0.0.1), listening on port 5000, which is the default port used in Flask applications. The RandomBot you just registered as random corresponds to an HTML file in the static folder in httpfrontend: play_random_99.html. In this file, a Go board is rendered, and it’s also the place in which the rules of human-bot game play are defined. The human opponent starts with the black stones; the bot takes white. Whenever a human move has been played, the route/select-move/random is triggered to receive the next move from the bot. After the bot move has been received, it’s applied to the board, and it’s the human’s move once again. To play against this bot, navigate to http://127.0.0.1:5000/static/play_random_99.html in your browser. You should see a playable demo, as shown in figure 8.2.

Figure 8.2. Running a Python web application to play against a Go bot in your browser

You’ll add more and more bots in the next chapters, but for now note that another frontend is available under play_predict_19.html. This web frontend talks to a bot called predict and can be used to play 19 × 19 games. Therefore, if you train a Keras neural network model on Go data and use a Go board encoder, you can first create an instance agent = DeepLearningAgent(model, encoder) and then register it in a web application web_app = get_web_app({'predict': agent}) that you can then start with web_app.run().

8.2.1. An end-to-end Go bot example

Figure 8.3 shows an end-to-end example covering the whole process (the same flow we introduced in the beginning of chapter 7). You start with the imports you need and load Go data into features X and labels y by using an encoder and a Go data processor, as shown in listing 8.8.

Listing 8.8. Loading features and labels from Go data with a processor

import h5py

from keras.models import Sequential
from keras.layers import Dense

from dlgo.agent.predict import DeepLearningAgent, load_prediction_agent
from dlgo.data.parallel_processor import GoDataProcessor
from dlgo.encoders.sevenplane import SevenPlaneEncoder
from dlgo.httpfrontend import get_web_app
from dlgo.networks import large

go_board_rows, go_board_cols = 19, 19
nb_classes = go_board_rows * go_board_cols
encoder = SevenPlaneEncoder((go_board_rows, go_board_cols))
processor = GoDataProcessor(encoder=encoder.name())

X, y = processor.load_go_data(num_samples=100)

Figure 8.3. The training process for a deep-learning Go bot

Equipped with features and labels, you can build a deep convolutional neural network and train it on this data. This time, you choose the large network from dlgo.networks and use Adadelta as the optimizer.

Listing 8.9. Building and running a large Go move-predicting model with Adadelta

input_shape = (encoder.num_planes, go_board_rows, go_board_cols)
model = Sequential()
network_layers = large.layers(input_shape)
for layer in network_layers:
    model.add(layer)
model.add(Dense(nb_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta',
     metrics=['accuracy'])

model.fit(X, y, batch_size=128, epochs=20, verbose=1)

After the model has finished training, you can create a Go bot from it and save this bot in HDF5 format.

Listing 8.10. Creating and persisting a `DeepLearningAgent`

deep_learning_bot = DeepLearningAgent(model, encoder)
deep_learning_bot.serialize("../agents/deep_bot.h5")

Finally, you can load the bot from file and serve it in a web application.

Listing 8.11. Loading a bot back into memory and serving it in a web application

model_file = h5py.File("../agents/deep_bot.h5", "r")
bot_from_file = load_prediction_agent(model_file)

web_app = get_web_app({'predict': bot_from_file})
web_app.run()

Of course, if you’ve already trained a strong bot, you can skip all but the last part. For instance, you could load one of the models stored in checkpoints in chapter 7 and see how they perform as opponents in action by changing the model_file accordingly.

8.3. Training and deploying a Go bot in the cloud

Until this point, all development took place on your local machine at home. If you’re in the good position to have a modern GPU available on your computer, training the deep neural networks we developed in chapters 5–7 isn’t of concern for you. If you don’t have a powerful GPU or can’t spare any compute time on it, it’s usually a good option to rent compute time on a GPU in the cloud.

If you disregard training for now and assume you have a strong bot already, serving this bot is another situation in which cloud providers can come in handy. In section 8.2, you ran a bot via a web application hosted from localhost. If you want to share your bot with friends or make it public, that’s not exactly ideal. You neither want to ensure that your computer runs night and day, nor give the public access to your machine. By hosting your bot in the cloud, you separate development from deployment and can simply share a URL with anyone who’s interested in playing your bot.

Because this topic is important, but somewhat special and only indirectly related to machine learning, we entirely outsourced it to appendix D. Reading and applying the techniques from this appendix is entirely optional, but recommended. In appendix D, you’ll learn how to get started with one particular cloud provider, Amazon Web Services (AWS). You’ll learn the following skills in the appendix:

Creating an account with AWS
Flexibly setting up, running, and terminating virtual server instances
Creating an AWS instance suitable for deep-learning model training on a cloud GPU at reasonable cost
Deploying your Go bot served over HTTP on an (almost) free server

On top of learning these useful skills, appendix D is also a prerequisite for deploying a full-blown Go bot that connects to an online Go server, a topic we cover later in section 8.6.

8.4. Talking to other bots: the Go Text Protocol

In section 8.2, you saw how to integrate your bot framework into a web frontend. For this to work, you handled communication between the bot and human player with the Hypertext Transfer Protocol (HTTP), one of the core protocols running the web. To avoid distraction, we purposefully left out all the details, but having a standardized protocol in place is necessary to pull this off. Humans and bots don’t share a common language to exchange Go moves, but a protocol can act as a bridge.

The Go Text Protocol (GTP) is the de facto standard used by Go servers around the world to connect humans and bots on their platforms. Many offline Go programs are based on GTP as well. This section introduces you to GTP by example; you’ll implement part of the protocol in Python and use this implementation to let your bots play against other Go programs.

In appendix C, we explain how to install GNU Go and Pachi, two common Go programs available for practically all operating systems. We recommend installing both, so please make sure to have both programs on your system. You don’t need any frontends, just the plain command-line tools. If you have GNU Go installed, you can start it in GTP mode by running the following:

gnugo --mode gtp

Using this mode, you can now explore how GTP works. GTP is a text-based protocol, so you can type commands into your terminal and hit Enter. For instance, to set up a 9 × 9 board, you can type boardsize 9. This will trigger GNU Go to return a response and acknowledge that the command has been executed correctly. Every successful GTP command triggers a response starting with the symbol =, whereas failed commands lead to a ?. To check the current board state, you can issue the command showboard, which will print out an empty 9 × 9 board, as expected.

In actual game play, two commands are the most important: genmove and play. The first command, genmove, is used to ask a GTP bot to generate the next move. The GTP bot will usually also apply this move to its game state internally. All this command needs as arguments is the player color, either black or white. For instance, to generate a white move and place it on GNU Go’s board, type genmove white. This will lead to a response such as = C4, meaning GNU Go accepts this command (=) and places a white stone at C4. As you can see, GTP accepts standard coordinates as introduced in chapters 2 and 3.

The other game-play relevant move for us is play. This command is used to let a GTP bot know it has to play a move on the board. For instance, you could tell GNU Go that you want it to play a black move on D4 by issuing play black D4, which will return an = to acknowledge this command. When two bots play against each other, they’ll take turns asking each other to genmove the next move, and then play the move from the response on their own board. This is all pretty straightforward—but we left out many details. A complete GTP client has a lot more commands to handle, ranging from handling handicap stones to managing time settings and counting rules. If you’re interested in the details of GTP, see http://mng.bz/MWNQ. Having said that, at a basic level genmove and play will be enough to let your deep-learning bots play against GNU Go and Pachi.

To handle GTP and wrap your Agent concept so it can exchange Go moves by using this protocol, you create a new dlgo module called gtp. You can still try to follow the implementation alongside this main text, but from this chapter on, we suggest directly following our implementation on GitHub at http://mng.bz/a4Wj.

To start, let’s formalize what a GTP command is. To do so, we have to note that on many Go servers, commands get a sequence number to make sure that we can match commands and responses. These sequence numbers are optional and can be None. For us, a GTP command consists of a sequence number, a command, and potentially multiple arguments to that command. You place this definition in command.py in the gtp module.

Listing 8.12. Python implementation of a GTP command

class Command:

    def __init__(self, sequence, name, args):
        self.sequence = sequence
        self.name = name
        self.args = tuple(args)

    def __eq__(self, other):
        return self.sequence == other.sequence and 
            self.name == other.name and 
            self.args == other.args

    def __repr__(self):
        return 'Command(%r, %r, %r)' % (self.sequence, self.name, self.args)

    def __str__(self):
        return repr(self)

Next, you want to parse text input from the command line into Command. For instance, parsing “999 play white D4” should result in Command(999, 'play', ('white', 'D4')). The parse function used for this goes into command.py as well.

Listing 8.13. Parsing a GTP `Command` from plain text

def parse(command_string):
    pieces = command_string.split()
    try:
        sequence = int(pieces[0])       1
        pieces = pieces[1:]
    except ValueError:                  2
        sequence = None
    name, args = pieces[0], pieces[1:]
    return Command(sequence, name, args)

1 GTP commands may start with an optional sequence number.
2 If the first piece isn’t numeric, there’s no sequence number.

We’ve just argued that GTP coordinates come in standard notation, so parsing GTP coordinates into Board positions and vice versa is simple. You define two helper functions to convert between coordinates and positions in board.py within gtp.

Listing 8.14. Converting between GTP coordinates and your internal `Point` type

from dlgo.gotypes import Point
from dlgo.goboard_fast import Move

def coords_to_gtp_position(move):
    point = move.point
    return COLS[point.col - 1] + str(point.row)

def gtp_position_to_coords(gtp_position):
    col_str, row_str = gtp_position[0], gtp_position[1:]
    point = Point(int(row_str), COLS.find(col_str.upper()) + 1)
    return Move(point)

8.5. Competing against other bots locally

Now that you understand the basics of GTP, let’s dive right into an application and build a program that loads one of your bots and lets it compete against either GNU Go or Pachi. Before we present this program, we have just one technicality left to resolve—when our bot should resign a game or pass.

8.5.1. When a bot should pass or resign

At the current development status, your deep-learning bots have no means of knowing when to stop playing. The way you designed them so far, your bot will always pick the best move to play. This can be detrimental toward the end of the game, when it might be better to pass or even resign when the situation looks a little too bad. For this reason, you’ll impose termination strategies: you’ll explicitly tell the bot when to stop. In chapters 13 and 14, you’ll learn powerful techniques that’ll render this entirely useless (your bot will learn to judge the current board situation and thereby learn that sometimes it’s best to stop). But for now, this concept is useful and will help you on the way to deploy a bot against other opponents.

You build the following TerminationStrategy in a file called termination.py in the agent module of dlgo. All it does is decide when you should pass or resign—and by default, you never pass or resign.

Listing 8.15. A termination strategy tells your bot when to end a game

from dlgo import goboard
from dlgo.agent.base import Agent
from dlgo import scoring

class TerminationStrategy:

    def __init__(self):
        pass

    def should_pass(self, game_state):
        return False

    def should_resign(self, game_state):
        return False

A simple heuristic for stopping game play is to pass when your opponent passes. You have to rely on the fact that your opponent knows when to pass, but it’s a start, and it works well against GNU Go and Pachi.

Listing 8.16. Passing whenever an opponent passes

class PassWhenOpponentPasses(TerminationStrategy):

    def should_pass(self, game_state):
        if game_state.last_move is not None:
            return True if game_state.last_move.is_pass else False

def get(termination):
    if termination == 'opponent_passes':
        return PassWhenOpponentPasses()
    else:
        raise ValueError("Unsupported termination strategy: {}"
                         .format(termination))

In termination.py, you also find another strategy called ResignLargeMargin that resigns whenever the estimated score of the game goes too much in favor of the opponent. You can cook up many other such strategies, but keep in mind that ultimately you can get rid of this crutch with machine learning.

The last thing you need in order to let bots play against each other is to equip an Agent with a TerminationStrategy so as to pass and resign when appropriate. This TerminationAgent class goes into termination.py as well.

Listing 8.17. Wrapping an agent with a termination strategy

class TerminationAgent(Agent):

    def __init__(self, agent, strategy=None):
        Agent.__init__(self)
        self.agent = agent
        self.strategy = strategy if strategy is not None 
            else TerminationStrategy()

    def select_move(self, game_state):
        if self.strategy.should_pass(game_state):
            return goboard.Move.pass_turn()
        elif self.strategy.should_resign(game_state):
            return goboard.Move.resign()
        else:
            return self.agent.select_move(game_state)

8.5.2. Let your bot play against other Go programs

Having discussed termination strategies, you can now turn to pairing your Go bots with other programs. Under play_local.py in the gtp module, find a script that sets up a game between one of your bots and either GNU Go or Pachi. Go through this script step-by-step, starting with the necessary imports.

Listing 8.18. Imports for your local bot runner

import subprocess
import re
import h5py

from dlgo.agent.predict import load_prediction_agent
from dlgo.agent.termination import PassWhenOpponentPasses, TerminationAgent
from dlgo.goboard_fast import GameState, Move
from dlgo.gotypes import Player
from dlgo.gtp.board import gtp_position_to_coords, coords_to_gtp_position
from dlgo.gtp.utils import SGFWriter
from dlgo.utils import print_board
from dlgo.scoring import compute_game_result

You should recognize most of the imports, with the exception of SGFWriter. This is a little utility class from dlgo.gtp.utils that keeps track of the game and writes an SGF file at the end.

To initialize your game runner LocalGtpBot, you need to provide a deep-learning agent and optionally a termination strategy. Also, you can specify how many handicap stones should be used and which bot opponent should be played against. For the latter, you can choose between gnugo and pachi. LocalGtpBot will initialize either one of these programs as subprocesses, and both your bot and its opponent will communicate over GTP.

Listing 8.19. Initializing a runner to clash two bot opponents

class LocalGtpBot:

    def __init__(self, go_bot, termination=None, handicap=0,
                 opponent='gnugo', output_sgf="out.sgf",
                 our_color='b'):
        self.bot = TerminationAgent(go_bot, termination)    1
        self.handicap = handicap
        self._stopped = False                               2
        self.game_state = GameState.new_game(19)
        self.sgf = SGFWriter(output_sgf)                    3

        self.our_color = Player.black if our_color == 'b' else Player.white
        self.their_color = self.our_color.other

        cmd = self.opponent_cmd(opponent)                   4
        pipe = subprocess.PIPE
        self.gtp_stream = subprocess.Popen(
            cmd, stdin=pipe, stdout=pipe                    5
        )

    @staticmethod
    def opponent_cmd(opponent):
        if opponent == 'gnugo':
            return ["gnugo", "--mode", "gtp"]
        elif opponent == 'pachi':
            return ["pachi"]
        else:
            raise ValueError("Unknown bot name {}".format(opponent))

1 You initialize a bot from an agent and a termination strategy.
2 You play until the game is stopped by one of the players.
3 At the end, you write the game to the provided file in SGF format.
4 Your opponent will either be GNU Go or Pachi.
5 You read and write GTP commands from the command line.

One of the main methods used in the tool we’re demonstrating here is command_and_response, which sends out a GTP command and reads back the response for this command.

Listing 8.20. Sending a GTP command and receiving a response

    def send_command(self, cmd):
        self.gtp_stream.stdin.write(cmd.encode('utf-8'))

    def get_response(self):
        succeeded = False
        result = ''
        while not succeeded:
            line = self.gtp_stream.stdout.readline()
            if line[0] == '=':
                succeeded = True
                line = line.strip()
                result = re.sub('^= ?', '', line)
        return result

    def command_and_response(self, cmd):
        self.send_command(cmd)
        return self.get_response()

Playing a game works as follows:

Set up the board with the GTP boardsize command. You allow only 19 × 19 boards here, because your deep-learning bots are tailored to that.
Set the right handicap in the set_handicap method.
Play the game itself, which you’ll cover in the play method.
Persist the game record as an SGF file.

Listing 8.21. Set up the board, let the opponents play the game, and persist it

    def run(self):
        self.command_and_response("boardsize 19
")
        self.set_handicap()
        self.play()
        self.sgf.write_sgf()

    def set_handicap(self):
        if self.handicap == 0:
            self.command_and_response("komi 7.5
")
            self.sgf.append("KM[7.5]
")
        else:
            stones = self.command_and_response("fixed_handicap
     {}
".format(self.handicap))
            sgf_handicap = "HA[{}]AB".format(self.handicap)
            for pos in stones.split(" "):
                move = gtp_position_to_coords(pos)
                self.game_state = self.game_state.apply_move(move)
                sgf_handicap = sgf_handicap + "[" +
     self.sgf.coordinates(move) + "]"
            self.sgf.append(sgf_handicap + "
")

The game-play logic for your bot clash is simple: while none of the opponents stop, take turns and continue to play moves. The bots do that in methods called play_our_move and play_their_move, respectively. You also clear the screen, and print out the current board situation and a crude estimate of the outcome.

Listing 8.22. A game ends when an opponent signals to stop it

    def play(self):
        while not self._stopped:
            if self.game_state.next_player == self.our_color:
                self.play_our_move()
            else:
                self.play_their_move()
            print(chr(27) + "[2J")
            print_board(self.game_state.board)
            print("Estimated result: ")
            print(compute_game_result(self.game_state))

Playing moves for your bot means asking it to generate a move with select_move, applying it to your board, and then translating the move and sending it over GTP. This needs special treatment for passing and resigning.

Listing 8.23. Asking your bot to generate and play a move that’s translated into GTP

    def play_our_move(self):
        move = self.bot.select_move(self.game_state)
        self.game_state = self.game_state.apply_move(move)

        our_name = self.our_color.name
        our_letter = our_name[0].upper()
        sgf_move = ""
        if move.is_pass:
            self.command_and_response("play {} pass
".format(our_name))
        elif move.is_resign:
            self.command_and_response("play {} resign
".format(our_name))
        else:
            pos = coords_to_gtp_position(move)
            self.command_and_response("play {} {}
".format(our_name, pos))
            sgf_move = self.sgf.coordinates(move)
        self.sgf.append(";{}[{}]
".format(our_letter, sgf_move))

Letting your opponent play a move is structurally similar to your move. You ask GNU Go or Pachi to genmove a move, and you have to take care of converting the GTP response into a move that your bot understands. The only other thing you have to do is stop the game when your opponent resigns or both players pass.

Listing 8.24. Your opponent plays moves by responding to `genmove`

    def play_their_move(self):
        their_name = self.their_color.name
        their_letter = their_name[0].upper()

        pos = self.command_and_response("genmove {}
".format(their_name))
        if pos.lower() == 'resign':
            self.game_state = self.game_state.apply_move(Move.resign())
            self._stopped = True
        elif pos.lower() == 'pass':
            self.game_state = self.game_state.apply_move(Move.pass_turn())
            self.sgf.append(";{}[]
".format(their_letter))
            if self.game_state.last_move.is_pass:
                self._stopped = True
        else:
            move = gtp_position_to_coords(pos)
            self.game_state = self.game_state.apply_move(move)
            self.sgf.append(";{}[{}]
".format(their_letter,
 self.sgf.coordinates(move)))

That concludes your play_local.py implementation, and you can now test it as follows.

Listing 8.25. Letting one of your bots loose on Pachi

from dlgo.gtp.play_local import LocalGtpBot
from dlgo.agent.termination import PassWhenOpponentPasses
from dlgo.agent.predict import load_prediction_agent
import h5py

bot = load_prediction_agent(h5py.File("../agents/betago.hdf5", "r"))

gtp_bot = LocalGtpBot(go_bot=bot, termination=PassWhenOpponentPasses(),
                      handicap=0, opponent='pachi')
gtp_bot.run()

You should see the way the game between the bots unfolds, as shown in figure 8.4.

Figure 8.4. A snapshot of how Pachi and your bot see and evaluate a game between them

In the top part of the figure, you see the board printed by you, followed by your current estimate. In the lower half, you see Pachi’s game state (which is identical to yours) on the left, and on the right Pachi gives you an estimation of its current assessment of the game in terms of which part of the board it thinks belongs to which player.

This is a hopefully convincing and exciting demo of what your bot can do by now, but it’s not the end of the story. In the next section, we go one step further and show you how to connect your bot to a real-life Go server.

8.6. Deploying a Go bot to an online Go server

Note that play_local.py is really a tiny Go server for two bot opponents to play against each other. It accepts and sends GTP commands and knows when to start and finish a game. This produces overhead, because the program takes the role of a referee that controls how the opponents interact.

If you want to connect a bot to an actual Go server, this server will take care of all the game-play logic, and you can focus entirely on sending and receiving GTP commands. On the one hand, your fate becomes easier because you have less to worry about. On the other hand, connecting to a proper Go server means that you have to make sure to support the full range of GTP commands supported by that server, because otherwise your bot may crash.

To ensure that this doesn’t happen, let’s formalize the processing of GTP commands a little more. First, you implement a proper GTP response class for successful and failed commands.

Listing 8.26. Encoding and serializing a GTP response

class Response:
    def __init__(self, status, body):
        self.success = status
        self.body = body


def success(body=''):                                 1
    return Response(status=True, body=body)


def error(body=''):                                   2
    return Response(status=False, body=body)


def bool_response(boolean):                           3
    return success('true') if boolean is True else success('false')


def serialize(gtp_command, gtp_response):             4
    return '{}{} {}

'.format(
        '=' if gtp_response.success else '?',
        '' if gtp_command.sequence is None else str(gtp_command.sequence),
        gtp_response.body
    )

1 Making a successful GTP response with response body
2 Making an error GTP response
3 Converting a Python Boolean into GTP
4 Serializing a GTP response as a string

This leaves you with implementing the main class for this section, GTPFrontend. You put this class into frontend.py in the gtp module. You need the following imports, including command and response from your gtp module.

Listing 8.27. Python imports for your GTP frontend

import sys

from dlgo.gtp import command, response
from dlgo.gtp.board import gtp_position_to_coords, coords_to_gtp_position
from dlgo.goboard_fast import GameState, Move
from dlgo.agent.termination import TerminationAgent
from dlgo.utils import print_board

To initialize a GTP frontend, you need to specify an Agent instance and an optional termination strategy. GTPFrontend will then instantiate a dictionary of GTP events that you process. Each of these events, which includes common commands like play and others, will have to be implemented by you.

Listing 8.28. Initializing a `GTPFrontend`, which defines GTP event handlers

HANDICAP_STONES = {
    2: ['D4', 'Q16'],
    3: ['D4', 'Q16', 'D16'],
    4: ['D4', 'Q16', 'D16', 'Q4'],
    5: ['D4', 'Q16', 'D16', 'Q4', 'K10'],
    6: ['D4', 'Q16', 'D16', 'Q4', 'D10', 'Q10'],
    7: ['D4', 'Q16', 'D16', 'Q4', 'D10', 'Q10', 'K10'],
    8: ['D4', 'Q16', 'D16', 'Q4', 'D10', 'Q10', 'K4', 'K16'],
    9: ['D4', 'Q16', 'D16', 'Q4', 'D10', 'Q10', 'K4', 'K16', 'K10'],
}


class GTPFrontend:

    def __init__(self, termination_agent, termination=None):
        self.agent = termination_agent
        self.game_state = GameState.new_game(19)
        self._input = sys.stdin
        self._output = sys.stdout
        self._stopped = False

        self.handlers = {
            'boardsize': self.handle_boardsize,
            'clear_board': self.handle_clear_board,
            'fixed_handicap': self.handle_fixed_handicap,
            'genmove': self.handle_genmove,
            'known_command': self.handle_known_command,
            'komi': self.ignore,
            'showboard': self.handle_showboard,
            'time_settings': self.ignore,
            'time_left': self.ignore,
            'play': self.handle_play,
            'protocol_version': self.handle_protocol_version,
            'quit': self.handle_quit,
        }

After you start a game with the following run method, you continually read GTP commands that are forwarded to the respective event handler, which is done by the process method.

Listing 8.29. The frontend parses from the input stream until the game ends

    def run(self):
        while not self._stopped:
            input_line = self._input.readline().strip()
            cmd = command.parse(input_line)
            resp = self.process(cmd)
            self._output.write(response.serialize(cmd, resp))
            self._output.flush()

    def process(self, cmd):
        handler = self.handlers.get(cmd.name, self.handle_unknown)
        return handler(*cmd.args)

What’s left to complete this GTPFrontend is the implementation of the individual GTP commands. The following listing shows the three most important ones; we refer you to the GitHub repository for the rest.

Listing 8.30. A few of the most important event responses for your GTP frontend

    def handle_play(self, color, move):
        if move.lower() == 'pass':
            self.game_state = self.game_state.apply_move(Move.pass_turn())
        elif move.lower() == 'resign':
            self.game_state = self.game_state.apply_move(Move.resign())
        else:
            self.game_state =
     self.game_state.apply_move(gtp_position_to_coords(move))
        return response.success()

    def handle_genmove(self, color):
        move = self.agent.select_move(self.game_state)
        self.game_state = self.game_state.apply_move(move)
        if move.is_pass:
            return response.success('pass')
        if move.is_resign:
            return response.success('resign')
        return response.success(coords_to_gtp_position(move))

    def handle_fixed_handicap(self, nstones):
        nstones = int(nstones)
        for stone in HANDICAP_STONES[nstones]:
            self.game_state = self.game_state.apply_move(
                gtp_position_to_coords(stone))
        return response.success()

You can now use this GTP frontend in a little script to start it from the command line.

Listing 8.31. Starting your GTP interface from the command line

from dlgo.gtp import GTPFrontend
from dlgo.agent.predict import load_prediction_agent
from dlgo.agent import termination
import h5py

model_file = h5py.File("agents/betago.hdf5", "r")
agent = load_prediction_agent(model_file)
strategy = termination.get("opponent_passes")
termination_agent = termination.TerminationAgent(agent, strategy)

frontend = GTPFrontend(termination_agent)
frontend.run()

After this program runs, you can use it in exactly the same way you tested GNU Go in section 8.4: you can throw GTP commands at it, and it’ll process them properly. Go ahead and test it by generating a move with genmove or printing out the board state with showboard. Any command covered in your event handler in GTPFrontend is feasible.

8.6.1. Registering a bot at the Online Go Server

Now that your GTP frontend is complete and works in the same way as GNU Go and Pachi locally, you can register your bots at an online platform that uses GTP for communication. You’ll find that most popular Go servers are based on GTP, and appendix C covers three of them explicitly. One of the most popular servers in Europe and North America is the Online Go Server (OGS). We’ve chosen OGS as the platform to show you how to run a bot, but you could do the same thing with most other platforms as well.

Because the registration process for your bot at OGS is somewhat involved and the piece of software that connects your bot to OGS is a tool written in JavaScript, we’ve put this part into appendix E. You can either read this appendix now and come back here, or skip it if you’re not interested in running your own bot online. When you complete appendix E, you’ll have learned the following skills:

Creating two accounts at OGS, one for your bot and one for you to administer your bot account
Connecting your bot to OGS from your local computer for testing purposes
Deploying your bot on an AWS instance to connect to OGS for as long as you wish

This will allow you to enter a (ranked) game against your own creation online. Also, everyone with an OGS account can play your bot at this point, which can be motivating to see. On top of that, your bot could even enter tournaments hosted on OGS!

8.7. Summary

By building a deep network into your agent framework, you can make it so your models can interact with their environment.
Registering an agent in a web application, by building an HTTP frontend, you can play against your own bots through a graphical interface.
Using a cloud provider like AWS, you can rent compute power on a GPU to efficiently run your deep-learning experiments.
Deploying your web application on AWS, you can easily share your bot and let it play with others.
By letting your bot emit and receive Go Text Protocol (GTP) commands, it can play against other Go programs locally in a standardized way.
Building a GTP frontend for your bot is the most important stepping stone to registering it at an online Go platform.
Deploying a bot in the cloud, you can let it enter regular games and tournaments at the Online Go Server (OGS), and play against it yourself at any time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8. Deploying bots in the wild

Create new playlist

Sign In

Sign Up

Chapter 8. Deploying bots in the wild

8.1. Creating a move-prediction agent from a deep neural network

Listing 8.1. Initializing an agent with a Keras model and a Go board encoder

Listing 8.2. Encoding board state and predicting move probabilities with a model

Listing 8.3. Scaling, clipping, and renormalizing your move probability distribution

Listing 8.4. Trying to apply moves from a ranked candidate list

Listing 8.5. Serializing a deep-learning agent

Listing 8.6. Deserializing a DeepLearningAgent from an HDF5 file

8.2. Serving your Go bot to a web frontend

Figure 8.1. Building a web frontend for your Go bot. The httpfrontend module starts a Flask web server that decodes HTTP requests and passes them to one or more Go-playing agents. In the browser, a client based on the jgoboard library communicates with the server over HTTP.

Listing 8.7. Registering a random agent and starting a web application with it

Figure 8.2. Running a Python web application to play against a Go bot in your browser

8.2.1. An end-to-end Go bot example

Listing 8.8. Loading features and labels from Go data with a processor

Figure 8.3. The training process for a deep-learning Go bot

Listing 8.9. Building and running a large Go move-predicting model with Adadelta

Listing 8.10. Creating and persisting a DeepLearningAgent

Listing 8.11. Loading a bot back into memory and serving it in a web application

8.3. Training and deploying a Go bot in the cloud

8.4. Talking to other bots: the Go Text Protocol

Listing 8.12. Python implementation of a GTP command

Listing 8.13. Parsing a GTP Command from plain text

Listing 8.14. Converting between GTP coordinates and your internal Point type

8.5. Competing against other bots locally

8.5.1. When a bot should pass or resign

Listing 8.15. A termination strategy tells your bot when to end a game

Listing 8.16. Passing whenever an opponent passes

Listing 8.17. Wrapping an agent with a termination strategy

8.5.2. Let your bot play against other Go programs

Listing 8.18. Imports for your local bot runner

Listing 8.19. Initializing a runner to clash two bot opponents

Listing 8.20. Sending a GTP command and receiving a response

Listing 8.21. Set up the board, let the opponents play the game, and persist it

Listing 8.22. A game ends when an opponent signals to stop it

Listing 8.23. Asking your bot to generate and play a move that’s translated into GTP

Listing 8.24. Your opponent plays moves by responding to genmove

Listing 8.25. Letting one of your bots loose on Pachi

Figure 8.4. A snapshot of how Pachi and your bot see and evaluate a game between them

8.6. Deploying a Go bot to an online Go server

Listing 8.26. Encoding and serializing a GTP response

Listing 8.27. Python imports for your GTP frontend

Listing 8.28. Initializing a GTPFrontend, which defines GTP event handlers

Listing 8.29. The frontend parses from the input stream until the game ends

Listing 8.30. A few of the most important event responses for your GTP frontend

Listing 8.31. Starting your GTP interface from the command line

8.6.1. Registering a bot at the Online Go Server

8.7. Summary

Table of Contents for
Chapter 8. Deploying bots in the wild

Listing 8.6. Deserializing a `DeepLearningAgent` from an HDF5 file

Listing 8.10. Creating and persisting a `DeepLearningAgent`

Listing 8.13. Parsing a GTP `Command` from plain text

Listing 8.14. Converting between GTP coordinates and your internal `Point` type

Listing 8.24. Your opponent plays moves by responding to `genmove`

Listing 8.28. Initializing a `GTPFrontend`, which defines GTP event handlers