Games are a favorite subject for AI research, and it’s not just because they’re fun. They also simplify some of the complexities of real life, so you can focus on the algorithms you’re studying.
Imagine you see a comment on Twitter or Facebook: something like, “Ugh, I forgot my umbrella.” You’d quickly conclude that your friend got caught out in the rain. But that information isn’t included anywhere in the sentence. How did you reach that conclusion? First, you applied common knowledge about what umbrellas are for. Second, you applied social knowledge about the kinds of comments people bother to make: it’d be strange to say, “I forgot my umbrella” on a bright, sunny day.
As humans, we effortlessly factor in all this context when reading a sentence. This isn’t so easy for computers. Modern deep-learning techniques are effective at processing the information you supply them. But you’re limited in your ability to find all the relevant information and feed it to computers. Games sidestep that problem. They take place in an artificial universe, where all the information you need in order to make a decision is spelled out in the rules.
Games are especially well suited for reinforcement learning. Recall that reinforcement learning requires repeatedly running your program and evaluating how well it has accomplished a task. Imagine you’re using reinforcement learning to train a robot to move around a building. Before the control system is finely tuned, you risk the robot falling down a flight of stairs or knocking over your furniture. Another option is to build a computer simulation of the environment in which the robot will operate. This eliminates the risks of letting an untrained robot run around in the real world but creates new problems. First, you have to invest in developing a detailed computer simulation, which is a significant project in its own right. Second, there’s always a chance that your simulation isn’t completely accurate.
With games, on the other hand, all you need to do is have your AI play. If it loses a few hundred thousand matches while it’s learning, so what? In reinforcement learning, games are essential to serious research. Many cutting-edge algorithms were first demonstrated on Atari video games such as Breakout.
To be clear, you can successfully apply reinforcement learning to problems in the physical world. Many researchers and engineers have done so. But starting with games solves the problem of creating a realistic training environment and lets you focus on the mechanics and principles of reinforcement learning.
In this chapter, we introduce the rules of the game of Go. Next, we describe the structure of board-game AI at a high level, and identify points where you can introduce deep learning. Finally, we cover how you can evaluate the progress of your game AI throughout development.
You don’t need to be a strong Go player to read this book, but you do need to understand the rules well enough to enforce them in a computer program. Fortunately, the rules are famously simple. In short, two players alternate placing black and white stones on a board, starting with the black player. The goal is to control as much of the board as possible with your own stones.
Although the rules are simple, Go strategy has endless depth, and we don’t even attempt to cover it in this book. If you’re interested in learning more, we provide some resources at the end of this section.
A Go board is a square grid, shown in figure 2.1. Stones go on the intersections, not inside the squares. The standard board is 19 × 19, but sometimes players use a smaller board for a quick game. The most popular smaller options are 9 × 9 and 13 × 13 boards. (The size refers to the number of intersections on the board, not the number of squares.)
Notice that nine points are marked with a dot. These points are called the star points. Their main purpose is to help players judge distances on the board; they have no effect on game play.
One player plays with black stones, and the other plays with white stones. The two players alternate placing stones on the board, starting with the black player. Stones don’t move after they’re on the board, although they can be captured and removed entirely. To capture your opponent’s stones, you must completely surround them with your own. Here’s how that works.
Stones of the same color that are touching are considered connected together, as shown in figure 2.2. For the purposes of connection, we consider only straight up, down, left, or right; diagonals don’t count. Any empty point touching a connected group is called a liberty of that group. Every group needs at least one liberty to stay on the board. You can capture your opponent’s stones by filling their liberties.
When you place a stone in the last liberty of an opponent’s group, that group is captured and removed from the board. The newly empty points are then available for either player to play on (so long as the move is legal). On the flip side, you may not play a stone that would have zero liberties, unless you’re completing a capture.
An interesting consequence arises from the capturing rules. If a group of stones has two completely separate internal liberties, it can never be captured. See figure 2.3: black can’t play at A, because that black stone would have no liberties and its placement wouldn’t complete a capture because of the remaining liberty at B. Nor can black play at B, for the same reason. So black has no way to fill the last two liberties of the white group. These internal liberties are called eyes. In contrast, black can play at C to capture five white stones, because even though that black stone would have no liberties, it completes the capture. That white group has only one eye and is doomed to get captured at some point.
Although it’s not explicitly part of the rules, the idea that a group with two eyes can’t be captured is the most basic part of Go strategy. In fact, this is the only strategy you’ll specifically code into your bot’s logic. All the more advanced Go strategies will be inferred through machine learning.
Either player may pass any turn instead of placing a stone. When both players pass consecutive turns, the game is over. Before scoring, the players identify any dead stones: stones that have no chance of making two eyes or connecting up to friendly stones. Dead stones are treated exactly the same as captures when scoring the game. If a disagreement occurs, the players can resolve it by resuming play. But this is rare: if the status of any group is unclear, players usually try to resolve it before passing.
The goal of the game is to control a larger section of the board than your opponent. There are two ways to add up the score, but they nearly always give the same result.
The most common counting method is territory scoring. In this case, you get one point for every point on the board that’s completely surrounded by your own stones, plus one point for every opponent’s stone that you captured. The player with more points is the winner.
An alternative counting method is area scoring. With area scoring, you get one point for every point of territory, plus another point for every stone you have on the board. Except in rare cases, you’ll get the same winner by either method: if neither player passes early, the difference in captures will equal the difference in stones on the board.
Territory scoring is more common in casual play, but it turns out that area scoring is slightly more convenient for computers. Throughout the book, our AI assumes it’s playing under area scoring, unless otherwise noted.
In addition, the white player gets extra points as compensation for going second. This compensation is called komi. Komi is usually 6.5 points under territory scoring or 7.5 points under area scoring—the extra half point ensures there are no ties. We assume 7.5 points komi throughout the book.
Figure 2.4 shows the final position of an example 9 × 9 game.
Here’s how you score this game:
A game can end in one more way: either player can resign at any point. In a game between experienced players, it’s considered courteous to resign when you’re clearly behind. For our AI to be a good opponent, it should learn to detect when it should resign.
One more restriction exists on where you can place stones. To ensure that the game ends, it’s illegal to play a move that will return the board to a previous state. Figure 2.5 shows an example of how this can happen.
In the diagram, black has just captured one stone. White might like to play at the point marked A to recapture the new black stone. But that would put the board back in the same position it was in two moves ago. Instead, white must play somewhere else on the board first. After that, if white captures at A, the global board position is different, so it’s legal. Of course, this gives black the opportunity to protect the vulnerable stone. In order to recapture the black stone, the white player must create a distraction big enough to draw black’s attention elsewhere on the board.
This situation is called a ko, from the Japanese word for eternity. Players adopt special strategies when a ko is on the board, and this was a weak spot for previous generations of Go-playing programs. In chapter 7, we show how to give your neural networks a hint to help them learn ko tactics. This is a general technique for training neural networks effectively. Even when you can’t articulate the rules that you want the neural network to learn, you can encode your inputs in a way that emphasizes the situations you want it to pay attention to.
When two players of unequal strength play, a simple system keeps the game interesting. The weaker player takes black and gets to place a few stones on the board before the game starts; these stones are called handicap stones. Then the stronger player takes white and makes the first move. In addition, in a handicap game, komi is usually reduced to just half a point. Normally, the purpose of komi is to negate the advantage a player gets from playing first; but the point of a handicap is to give black an extra advantage, so the two would work at cross purposes. The remaining half point of komi is just to break ties.
It’s traditional to place the handicap stones on the star points, but some players allow the black player to choose where to put them.
Although we’ve covered the rules of the game, we haven’t even scratched the surface of what makes Go so engrossing and addictive. That’s beyond the scope of this book, but we encourage you to play a few games and learn more on your own. Here are a few resources for further exploration:
Whether you’re programming a computer to play Go or tic-tac-toe, most board-game AIs share a similar overall structure. In this section, we provide a high-level overview of that structure, and identify specific problems the AI needs to solve. Depending on the game, the best solutions may involve game-specific logic, machine learning, or both.
Early in the game, it’s difficult to evaluate a particular move because of the huge number of variations in the rest of the game. Both chess and Go AIs often use an opening book: a database of opening sequences taken from expert human games. To build this, you need a collection of game records from strong players. You analyze the game records, looking for common positions. In any common position, if a strong consensus exists about the next move—say, one or two moves account for 80% of the follow-ups—you add those moves to the opening book. The bot can then consult the book when playing a game. If any early game position appears in the opening book, the bot just looks up the expert move.
In chess and checkers, where pieces are removed from the board as the game progresses, AIs also contain similar endgame databases: when just a few pieces remain on the board, you can calculate all the variations in advance. This technique doesn’t apply to Go, where the board fills up toward the end.
The core idea behind board-game AI is tree search. Think about how humans play a strategy game. First, we consider a possible move for our next turn. Then we need to think of how our opponent is likely to respond, then plan how we’d respond to that, and so on. We read out the variation as far as we can, and then judge whether the result is good. Then we backtrack a bit and look at a different variation to see if it’s better.
This closely describes how the tree-search algorithms used in game AI work. Of course, humans can keep only a few variations in their heads at once, while computers can juggle millions with no trouble. Humans make up for their lack of raw computing power with intuition. Experienced chess and Go players are scary good at spotting the handful of moves worth considering.
Ultimately, raw computing power won out in chess. But a Go AI that could compete with top human players took an interesting twist: bringing human intuition to computers.
In the context of game tree search, the number of possible moves on a given turn is the branching factor.
In chess, the average branching factor is about 30. At the start of the game, each player has 20 legal options for the first move; the number increases a bit as the board opens up. At that scale, it’s realistic to read out every single possible move to four or five moves ahead, and a chess engine will read out the more promising lines much deeper than that.
By comparison, the branching factor in Go is enormous. On the first move of the game, there are 361 legal moves, and the number decreases slowly. The average branching factor is around 250 valid moves per turn. Looking ahead just four moves requires evaluating nearly 4 billion positions. It’s crucial to narrow down the number of possibilities. Table 2.1 shows how the branching factor affects the number of possible game positions by comparing chess to Go.
Branching factor 30 (chess) |
Branching factor 250 (Go) |
|
---|---|---|
After two moves | 900 | 62,500 |
After three moves | 27,000 | 15 million |
After four moves | 810,000 | 4 billion |
After five moves | 24 million | 1 trillion |
In Go, rules-based approaches to move selection turn out to be mediocre at this task: it’s extremely difficult to write out rules that reliably identify the most important area of the board. But deep learning is perfectly suited to the problem. You can apply supervised learning to train a computer to imitate a human Go player.
You start with a large collection of game records between strong human players; online gaming servers are a great resource here. Then you replay all the games on a computer, extracting each board position and the following move. That’s your training set. With a suitably deep neural network, it’s possible to predict the human move with better than 50% accuracy. You can build a bot that just plays the predicted human move, and it’s already a credible opponent. But the real power comes when you combine these move predictions with tree search: the predicted moves give you a ranked list of branches to explore.
The branching factor limits how far an AI can look ahead. If you could read out a hypothetical sequence all the way to the end of the game, you’d know who’d win; it’s easy to decide whether that’s a good sequence. But that’s not practical in any game more complex than tic-tac-toe: the number of possible variations is just too large. At some point, you have to stop and pick one of the incomplete sequences you’ve looked at. To do so, you take the final board position you’ve read out and assign it a numerical score. Of all the variations you analyzed, you select the move that leads to the best-scoring position. Figuring out how to compute that score is the tricky part: that’s the problem of position evaluation.
In chess AIs, position evaluation is based on logic that makes sense to chess players. You can start with simple rules like this: if you capture my pawn, and I capture your rook, that’s good for me. Top chess engines go far beyond that, factoring sophisticated rules about where on the board the pieces ended up and what other pieces are blocking their movement.
In Go, position evaluation may be even more difficult than move selection. The goal of the game is to cover more territory, but it’s surprisingly difficult to count territory: the boundaries tend to remain vague until late in the game. Counting captures doesn’t help much either; sometimes you can make it all the way to the end of a game with only a couple of captures. This is another area where human intuition reigns supreme.
Deep learning was a major breakthrough here as well. The kinds of neural networks that are suitable for move selection can also be trained to evaluate board positions. Instead of training a neural network to predict what the next move will be, you train it to predict who will win. You can design the network so it expresses its prediction as a probability; that gives you a numeric score to evaluate the board position.
As you work on your Go AI, you’ll naturally want to know how strong it is. Most Go players are familiar with the traditional Japanese ranking system, so you want to measure your bot’s strength on that scale. The only way to calibrate its level is by playing against other opponents; you can use other AIs or human players for that purpose.
Go players generally use the traditional Japanese system, where players are given kyu (beginner) or dan (expert) ranks. The dan levels, in turn, are divided into amateur dan and professional dan. The strongest kyu rating is 1 kyu, and larger numbers are weaker. Dan ranks go in the opposite direction: 1 dan is one level stronger than 1 kyu, and larger dan ranks are stronger. For amateur players, the scale traditionally tops out at 7 dan. Amateur players can get a rating from their regional Go association, and online servers will also track a rating for their players. Table 2.2 shows how the ranks stack up.
Amateur ranks are based on the number of handicap stones required to make up the difference in strength between two players. For example, if Alice is 2 kyu and Bob is 5 kyu, Alice will usually give Bob three handicap stones so that they have an equal chance of winning.
Professional ranks work a little differently: they’re more like titles. A regional Go association awards top players a professional rank based on results in major tournaments, and that rank is held for life. The amateur and professional scales aren’t directly comparable, but you can safely assume that any player with a professional rank is at least as strong as an amateur 7 dan player. The top pros are significantly stronger than that.
An easy way to estimate the strength of your own bots is to play against other bots of known strength. Open source Go engines such as GNU Go and Pachi provide good benchmarks. GNU Go plays at around a 5 kyu level, and Pachi is about 1 dan (Pachi’s level varies a bit depending on the amount of computing power you provide it). So if you have your bot play GNU Go 100 times, and it wins about 50 games, you can conclude that your bot is also somewhere in the neighborhood of 5 kyu.
To get a more precise rank, you can set up your AI to play on a public Go server with a rating system. A few dozen games should be enough to get a reasonable estimate.